Journal of Experimental Education 


Volume VIII 


DECEMBER, 1939 


Number 2 


In previous studies with children of pre- 
school age (1)(2) certain hypotheses regard- 
ing the dynamic nature of human relations 
were verified for an experimental play situa- 
tion. It was held hypothetically, however, 
that the same principles of human behavior 
applied as well to the social and psychological 
interplay of children of all ages and to adult— 
child relations. During the past year reliable 
techniques have been developed for measur- 
ing the behavior of kindergarten teachers in 
terms of the defined concepts of domination 
and social integration. These studies have 
been reported elsewhere. (3) (4) 

The investigation reported herewith consti- 
tuted a study of dominative and of socially 
integrative behavior of kindergarten children 
in an experimental play situation. The meth- 
ods, materials and definitions of terms were 
the same as for the previous study of pre- 
school children. The results have been exam- 
ined first, for similarities and differences in 
the behavior of the kindergarten children as 
compared with the children used in the pre- 
vious study; and second, for the relation of 
the findings on kindergarten children to the 
' formulated hypotheses regarding the dynamic 
: nature of behavior. 
| Definitions of Terms. Briefly, “The use of 
‘force, commands, threats, shame, blame, at- 
‘tacks against the personal status of an indi- 
‘vidual are called dominative techniques of 
responding to others. Domination is charac- 
terized by a rigidity or inflexibility of pur- 
pose, by an unwillingness to admit the ton- 
tribution of another’s experience, desires, pur- 
poses or judgment in the determining of goals 
which concern others. — Domination stifles 
Hcifferences; domination attempts to make 

others belave according to one’s own stand- 
ards or purposes.—Domination involves force 


"Read at the meetings of the Midwestern Psychological 
sociation, May 5, 1939, University of Nebraska, Lincoln. 


- DOMINATION AND SOCIAL INTEGRATION IN THE BEHAVIOR 
| OF KINDERGARTEN CHILDREN IN AN 
EXPERIMENTAL PLAY SITUATION* 


Haro_tp H. ANDERSON 
: Department of Psychology, University of Illinois 


or threats of force or of some other form of 
the expenditure of energy against another.— 
(Domination is self-protective; it)—is be- 
havior of one who is so insecure that he is 
not free to utilize new data, new information, 
new experience. Domination is an attempt at 
atomistic living; the desires, purposes, stand- 
ards, values, judgment, welfare of others do 
not count; it is rugged individualism of a 
highly in-growing order. Domination is the 
antithesis of the scientific attitude; it is an 
expression of resistance against change (or 
differences) ; it is consistent with bigotry and 
with autocracy. It is the technique of a dic- 
tatorship.” 


It should be pointed out that the terms 
dominate, dominative, dominating, and domi- 
nation are used here to designate techniques 
of behaving. As such they are not to be con- 
fused with the terms dominant and domi- 
nance. These latter terms refer to defined 
relationships which may or may not be 
achieved by means of the techniques of 
domination or the use of force or coercion. 


If, instead of compelling the companion to 
do as one says, one asks the companion and 
by explanation makes the request meaningful 
to the other so that the other can voluntarily 
cooperate, such behavior is said to be an ex- 
pression not so much of pursuing one’s own 
unique purposes as attempting to discover and 
get satisfactions through common purposes. 
For such expenditure of energy in common 
purposes, for an attempt to reduce instead of 
augment or incite conflict of differences the 
term integrative behavior is used. The person 
who can change his mind when confronted 
with new evidence which has grown out of 
the experience of another is said to be in- 
tegrating differences. Integrative behavior as 
the term is used here is consistent with the 
scientific point of view, the objective ap- 


123 


124 


proach. It designates behavior that is flexible, 
growing, learning. 

The term integration is not used here as it 
has been used by some in contrast with dif- 
ferentiation. It is believed that the two 
\ processes are inseparable and are merely dif- 
ferent aspects of the same psychological or 
biological phenomenon. With the integration 
of differences something new is created that 
never has existed before; this emergence of 
originals through the integration of differences 
is itself a differentiation. 

Integrative behavior is thus consistent with 
concepts of growth and learning. It makes 
allowance in one’s own behavior for differ- 
ences in others. Whereas domination stifles 
or frustrates the achievement of individual 
differences, socially integrative behavior re- 
spects differences, advances the psychological 
processes of differentiation. Integrative be- 
havior is flexible, adaptive, objective, scien- 


Anderson 
Form 2 
May 1958 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


tific, cooperative. It is an expression of the 
operation of the democratic processes. 

Hypotheses. The previous experimental 
studies offered consistent evidence in support 
of the hypotheses that: 


1. Domination in one child incites domina- 
tive techniques in the companion. 

2. Integrative behavior induces cooperation 
or integrative behavior in the com. 
panion. 

3. Domination as a psychological technique 
of behaving is not only different from, 
but where a potential avenue of escape 
is left open, it is dynamically unrelated 
to integrative behavior. 


It is assumed that short of the extermina- 
tion of another individual there is no situation 
in which the interplay is entirely dominative 


‘and also that there is no situation in which 


the interplay is entirely integrative. But 


Group 


RECORD OF DOMINATION—INTEGRATIVE BEHAVIOR 
Department of Psychology, University of Illinois 


1. Verbal demands to secure materials 


2. Forceful attempts to secure materials ~~~~- 


3. Succeeds in securing materials 


4. Defends, snatches back materials ___-----~~- 


5. Verbal commands to direct c’s behavior ---|---------- 


6. Forceful attempts to direct behavior 


7. Sueceeds in directing behavior —__-~~- 


8. Criticizes, reproves companion 


9. Shows common purpose by word or action 
material 


. Complies with request or suggestion 


12. Sets pattern including gesture which companion imitates 


. Verbal request or suggestion to direct c’s behavior or secure 


Dom. Scores (1-8) 
Integ. Scores (9-12 plus c-11) 


A B 
| 
| 
__| 
| 
Figure 1. 


many situations arise in which the techniques 
of responding to others can be reliably said 
to be expressions of domination or of integra- 
tive behavior. 


TABLE 1 


NUMBERS OF CHILDREN USED IN THE 
PAIRED COMPARISONS 


Total 
School Session Boys Girls Total Pairings 
: AM. 8 11 19 30 
P.M. 8 11 19 96 
A.M. 5 6 11 50 
Totals 21 2 49 176 


Subjects. The subjects used in this study 
were forty-nine children, twenty-one boys 
and twenty-eight girls, from three kinder- 
garten groups in two schools. Tabulation of 
numbers according to sex and group together 
with the number of pairings is given in 
Table 1. Most of the children came from 
homes of superior socio-economic status and 
practically all were of normal or superior in- 
telligence as measured by the revised 
Stanford-Binet mental tests. 

Methods and Procedure. Children were 
taken at random in pairs to the testing room 
where they were allowed to play for five 
minutes. In the room was a sand box on a 
low table. In the sand box were a toy sand 
pail, shovel, sieve, two automobiles, and three 
rubber toy animals always arranged in the 
same position when the children arrived. As 
he led the children into the room the experi- 
menter said, “Here are some toys for you to 
play with until I come back and get you. We 
will keep them in the sand box all the time, 
but you may play with anything you want 
to.” In the morning group of school X most 
children had two pairings. In the afternoon 
all children with few exceptions were paired 
with five others. In school Y also most of the 
children were paired five times. 

Methods of Scoring. The observation blank 
shown in Figure 1 is based on the blank which 
was developed for the previous study (2, pp. 
350-351). Domination scores in a pairing in- 
cluded all tallies in items 1-8 inclusive. Inte- 
gration scores included all tallies for items 
9-12 inclusive plus the number of the com- 
panion’s tallies for item 11. The definitions of 
categories of domination and of integrative 
behavior in children’s responses to each other 
were taken exactly from the previous study 


December, 1939] |. DOMINATION AND SOCIAL INTEGRATION 125 


TABLE 2 


COEFFICIENTS OF CORRELATION BETWEEN HIGH 
AND Low SCORES PER PAIRING FOR CHILD’s 
WITH COMPANION’S DOMINATION SCORES AND 
INTEGRATION SCORES RESPECTIVELY; COEFFI- 
CIENTS OF CORRELATION BETWEEN DOMINA- 
TION SCORES AND CHILD’S OWN INTEGRATION 
SCORES AND BETWEEN DOMINATION SCORES 
AND COMPANION’S INTEGRATION SCORES 


School Session N r 
Domination with Companion’s Domination 
A.M. & P.M. 63 .85 + .02 
) A.M. 24 82 + .05 

513 .68 + .02 
Integration with Companion’s Integration 
A.M. & P.M. 63 .65 + .05 

514 82 + .01 

Domination with Own Integration 
A.M. 30 16 2.32 


P.M. 96 —.08 + .07 
A.M.&P.M. 126 —.01 + .06 


| A.M. 48 16 + .09 
1030 —.07 + .02 
Domination with Companion’s Integration 

A.M. 30 18 + .12 


P.M. 96 —.23 + .07 

A.M.&P.M. 126 —11+.06 

A.M. 48 .04 + .10 
1030 —.10 + .02 


* Data on preschool children obtained from 
Anderson (2). 


and are not reproduced here. (2, pp. 351- 
354). 

Reliability of Observations by Two 
Observers." The coefficients of reliability of 
simultaneous and independent recordings by 
two observers were .g8 + .o2 for domination 
scores on seventy-eight pairings and .go + 
.02 for integration scores on seventy pairings. 

The Dynamic Nature of Interactivity as 
Shown in Domination and Integration Scores. 
One way of showing whether there is a 
dynamic relation in domination scores is to 
correlate the high domination scores in each 
pairing with the domination scores of the 
companion. 

Table 2 gives the coefficients of correlation 
of high domination scores with low domina- 
tion scores for sixty-three pairings in school 
X and for twenty-four pairings in school Y. 
The respective coefficients of correlation are 
85 and .82. These coefficients show an even 
higher relationship between domination in 

1 The writer was assisted by Gladys Lowe Anderson. 


126 


one child and domination in the companion 
than was found in a much larger number of 
pairings of preschool! children. Table 2 in- 
cludes also the previously reported coefficient 
of .68 + .o2 obtained from 513 pairings of 
preschool children. (2, p. 395)- 

Table 2 gives also the coefficients of cor- 
relation of high with low integration scores for 
kindergarten children in schools X and Y, 
together with the coefficient obtained in the 
previous study with preschool children. Sixty- 
three pairings in school X yielded a coefficient 
of correlation of .65 and twenty-four pairings 
in school Y showed a coefficient of .72. These 
coefficients are lower than, but entirely con- 
sistent with, the coefficient of .82 + .o1 
obtained from 514 pairings of preschool chil- 
dren (2, p. 397). 

This study thus constitutes additional evi- 
dence in support of the findings of the pre- 
vious study. The first two hypotheses men- 
tioned above, that domination incites domina- 
tion and that integrative behavior in one child 
tends to induce integrative behavior in the 
companion, have received consistent support 
in both investigations. 

The Relation 0f Domination to Integration. 
Domination scores were also correlated with 
the child’s own integration scores and with 
the companion’s integration scores. Table 2 
gives these coefficients for the kindergarten 
children and reproduces the coefficients 
obtained previously with preschool children. 

Without exception all the coefficients show 
that in this experimental situation a child’s 
domination scores are unrelated either to his 
own integration scores or to the integration 
scores of his companion. 

Sex Comparisons. Resumé of Previous 
Findings. In the previous study of children 
of preschool age three groups of children were 
used; one group of Iowa City children supe- 
rior in intelligence and coming from homes 
of superior socio-economic status, and two 
groups of children in an orphanage, one of 
which (preschool group) attended a nursery 
school, the other (control) group living in the 
orphanage but not attending nursery school. 
When data from all three groups were com- 
bined to show sex comparisons of mean dom- 
ination scores and mean integration scores it 
was found that girls were more dominating 
than boys, the critical ratio between the ob- 
tained difference and the standard error of 
the difference being 4.60. It was found also 


JOURNAL OF EXPERIMENTAL EDUCATION 


[ Vol. 8, No. 2 


that girls were less integrative than boys. 
there being ninety-eight chances in one hun- 
dred that the obtained difference represented 
a true difference greater than zero. 

Other findings of sex differences in the 
orphanage children alone may be summarized 
as follows: 


a. 


The nursery school and control groups 
of orphanage children showed a consist- 
ency of scores in their sex comparisons: 
they differed only in degree. 


b. Girls were more dominating than boys. 


When sexes were cross-paired, girls de- 
creased in dominative behavior and boys 
increased in dominative behavior, the 
difference in the former case being sig- 
nificant; in the latter case there were 
ninety-two chances in one hundred that 
the obtained difference represented a 
true difference greater than zero. 


. Girls and boys were not significantly 


different in integrative behavior when 
paired with their own sex. 

When the sexes were cross-paired, both 
boys and girls showed decreases in 
integrative behavior. 

Boys were slightly more integrative in 
the cross-pairings of sexes than were 
girls though the difference was not sig- 
nificant. 


. cross-pairings, girls showed lower 


scores than boys in both dominative 
and integrative behavior. This was true 
for both nursery school and contro! 
groups of children. 


. As regarded total interactivity (the sum 


of domination and integration scores), 
the data presented indicated that among 
these small orphanage children sex dif- 
ferences were already established. (2, 


pp. 376-388). 


Sex Comparisons. Findings of the Present 
Study. Table 3 gives for the kindergarten 
children in the present study the mean domi- 
nation and mean integration scores respec: 
tively, together with standard errors of the 
means and Domination—Integration ratios, for 
boys when paired with boys, boys when 
paired with girls, girls when paired with boys, 
and girls when paired with girls. Table 3 
shows also for different comparisons of mean 
scores the obtained differences between mean 
scores, standard errors of the differences, cr't- 


December, 1939] 


DOMINATION AND SOCIAL INTEGRATION 


127 


TABLE 3 


Sex COMPARISONS IN DOMINATION AND INTEGRATION Scores. MEANS, STANDARD ERRORS OF 
rug MEANS, AND DOMINATION-INTEGRATION RATIOS FOR Boys WHEN PAIRED WITH Boys, 
Roys WHEN PAIRED WITH GIRLS, GIRLS WHEN PAIRED WITH Boys, AND FOR GIRLS WHEN 


PAIRED WITH GIRLS 


Number 
School X, A. M. + P.M. Pairings 
Boys with boys 22 
Boys with 31 
Girls with boys ~----------- 31 
Girls with girls ------------ 42 
Higher 
School X, A.M. + P.M. Mean 
BB 
BB with BG 
BG 
Higher 
School X, A.M. + P.M. Mean 
BB 
GB 
BB 
BB 
GG 


Domination Integration 
S.E. S.E. D-I 
Mean Mean Mean Mean Ratio 
4.95 1.09 37.64 3.38 11 
2.90 .68 34.39 2.00 08 
2.87 .67 35.10 1.82 08 
1.43 36.88 04 
Standard 
Error of Critical Chances 
Difference Difference Ratio in 100 
Domination 
2.62 1.14 2.30 99 
.03 .96 .03 52 
1.15 1.28 .90 82 
1.44 75 1.92 97 
1.18 1.28 .92 82 
1.47 76 1.93 97 
Standard 
Error of Critical Chances 
Difference Difference Ratio in 100 
Integration 
.76 3.80 .20 58 
71 2.70 .26 60 
3.25 3.91 83 79 
1.78 2.54 .70 76 
2.54 3.82 .66 74 
2.49 2.67 .93 83 


The symbols “BB with GG” indicate that the mean score for boys when paired with boys 
is being compared with the mean score for girls when paired with girls. “BG with GB” indi- 
cate that the mean score for boys when paired with girls is being compared with the mean 


score for girls when paired with boys. 


ical ratios, and the chances in 100 that the 
obtained differences represent true differences 
greater than zero. 

a. Domination. Figure 2 shows graphically 
the mean domination scores given in Table 3, 
together with a reproduction to scale of the 
mean domination scores of the orphanage 
nursery school and control groups taken from 
Table 21 and Figure 8 of that study. (2, pp. 
380, 379) 

1. Résumé of the Domination Scores of 
Orphanage Children. In the previous study of 
orphanage children it was found that when a 
more dominating group was cross-paired with 
a less dominating group there was a dynamic 
relation which was sufficient to indicate a 
tendency toward over-compensation. This 
dynamic relation together with the over- 
compensation was shown both in the cross- 
pairings of the more dominating orphanage 
nursery school children with the less dominat- 


ing control group children and in the cross- 
pairings of the more dominating girls with the 
less dominating boys. In the cross-pairings of 
the sexes the orphanage control group showed 
the same tendencies as the orphanage nursery 
school children though the changes did not 
represent significant differences. By way of 
explanation it was said in the previous report 
that: “When the little sheep is paired with 
the big, bad wolf, the little sheep becomes not 
only less sheepish than the sheep but more 
wolfish than the wolf. And the wolf becomes 
more sheepish than the sheep.” (2, p. 368). 


2. School X, Morning and Afternoon 
Groups Combined. When the data for the 
morning session were combined with the data 
for the pairings in the afternoon session, the 
dynamic relationship of dominative behavior 
to dominative behavior was still present al- 
though there was no tendency toward over- 


compensation in the cross-pairings. 


128 


A glance at the domination scores repre- 
sented in Figure 2 shows that the lowest mean 
domination scores were obtained by girls 
when paired with girls. The differences be- 
tween these scores and the domination scores 
represented by the three other columns ap- 


Ht 


Bovs Gees Gs 
Bors Gres Boys Gees 


X AM. ana PM. 


é 
‘ | 
— — — 

Boys Bors Gees Gis 

Bors Gms 

53 6/ 6/ 


ORPHANAGE (VURSERY SCHOOL 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


| 
| 
aH 4 = 
+—4 
| 
Boys Bors Gres  Gires 
Bors Gres Ges 
42 42 38 


ORPHANAGE CONTROL GeouP 


Figure 2. Mean denomination scores. 


proaches statistical significance in each case. 
As compared with mean domination scores of 
boys when paired with boys there are ninety- 
nine chances in 1oo that the obtained differ- 
ence represents a true difference greater than 
zero. As compared with the scores for girls 
when paired with boys there are ninety-seven 
chances in 100 that the obtained difference is 
statistically significant. 

b. Integration. Figure 3 shows graphically 
the mean integration scores given in Table 3, 
together with a reproduction to scale of the 
mean integration scores of the orphanage 
nursery school and control groups taken from 
Table 21 and Figure 9 of that study. (2, pp. 
380, 384). In comparing domination scores 
with integration scores it should be noted that 
Figures 2 and 3 are not drawn to the same 
vertical scale. One unit of measurement in 
Figure 2 is equal to six units in Figure 3. 

1. Résumé of the Study of the Orphanage 
Children. In the study of orphanage children 
the nursery school group was found to be less 
integrative than the control group in the com- 
parison of homogeneous or own-group pall- 
ings. In the cross-pairings between nursery 
and control group children the same dynamic 
tendencies toward over-compensation were 
found as was revealed in dominative be 


— 


December, 1939] DOMINATION AND SOCIAL INTEGRATION 129 


havior: the low integrative group increased in 
integration scores and the high integrative 
group decreased so that the relationship of 
low to high was reversed. 

In the cross-pairings of sexes, however, the 
dynamic tendency toward over-compensation | 


20 


Moll | 


= = Bors Boys Gees 
Bors GRes Bors Gees 
58 <2 42 38 | 
ORPHANAGE CONTROL GROUP 
ka Figure 3. Mean integration scores. 
Bors Bors 
found in mean domination scores was absent 


in integrative behavior. In both nursery 
school and control groups the boys when 
paired with boys were not significantly dif- 

— —————— ferent from girls when paired with girls. In 
the cross-pairings of the sexes the mean in- 
sory tegration scores of both boys and girls 
dropped for both the nursery school and con- 
trol group orphanage children. In the control 
group the drop in boys’ mean score had 
ninety-six chances in 100 of representing a 
significant difference; the drop in girls’ mean 

integration score in the cross-pairings had 

ninety-nine chances in 100 of representing a 
* aw true difference greater than zero. A justifiable 


inference is that these decreases in integra- 
tive behavior represent an inhibition or shy- 
ness in the cross-pairings not present in the 

| | 2. School X, Morning and Afternoon Ses- 
sion Combined. When the integration scores 
| ] | [ | for morning and afternoon groups were com- 
bined for greater numbers, the mean scores 
for own-sex pairings of boys and girls are not 


Bors Boys Gres Gees 


Gee significantly different. In the cross-pairings 
| Mursee 53 6/ 6 o/ the mean scores for both sexes dropped. To 
this extent the data are similar to the data 
ORPHANAGE NURSERY SCH O02 obtained for both orphanage groups. The de- 


Jo { | 
“TT | | 
| 


130 


creases in mean integration scores are, how- 
ever, not significant. 

Domination—Integration Ratios, D-I ratios 
are obtained by dividing domination scores 
by integration scores. In the measurement of 
teachers’ contacts with children in another 
study (3) it was found that with very few 
_ exceptions the teachers used dominating tech- 
niques much more frequently than they used 
integrative techniques. Table 3 shows that in 
the contacts which these kindergarten children 
had with each other the reverse ratio obtained. 
With the exception of the behavior of the 
older boys in the afternoon session of school 
X there are no mean scores in which domina- 
tion is even one-tenth the mean integration 
score. 


The D-I ratios of these kindergarten chil- 
dren were different also from those shown in 
the previous study of children in the orphan- 
age. In that study the D-I ratios were not 
presented as such, but they were easily ob- 
tained (2, p. 380). For the orphanage nursery 
school children the D-I ratios ranged from 
.54 to 1.19 and for the orphanage control 
group the D-I ratios ranged from .34 to .70. 
These ratios are without exception higher than 
those obtained with the kindergarten children 
in the present study. It should be mentioned 
that almost without exception they are in turn 
not only lower but considerably lower than 
the D-I ratios for teachers’ contacts with 
these same kindergarten children. 


Relation of Domination and Integration 
Scores to Mental Age and Chronological Age. 
When mental age is correlated with the child’s 
domination scores, the coefficient is —.44 + 
.0g. This coefficient indicates for the kinder- 
garten children in this study a consistent and 
slightly higher relationship as compared with 
the data in the previous study with children 
of preschool age (2, p. 389). 

With chronological age domination scores 
show a zero relation. The coefficient of corre- 
lation is .og + .11. In the previous study one 
thousand pairings of preschool children 
yielded the more definitive zero coefficient of 
00 + .02. With 128 pairings of preschool 
children, however, when mental age was held 
constant the coefficient of correlation between 
chronological age and domination scores was 
.24 + .06 (2, p. 389). 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


The integration scores for kindergarten 
children in the experimental play situation 
show a coefficient of .17 + .11 when cor- 
related with mental age. In the previous 
study, among 872 pairings a slight positive 
relation was shown in the coefficient of .30 + 
.03. When chronological age was held con- 
stant for 128 pairings a coefficient of .34 — 
05 was obtained (2, p. 392). 

The kindergarten children in the present 
study show a zero relation between their in- 
tegrative scores and chronological age. The 
coefficient of correlation is .10 + .11. This is 
consistent with a zero relation obtained for 
preschool children on 975 pairings (2, p. 392). 


SUMMARY 


The present study compared the psycho- 
logical interplay of kindergarten children in 
an experimental play situation with data pre- 
viously reported for preschool children. Inter- 
play was recorded in terms of domination and 
integrative behavior as defined in previous 
studies. 


As in the previous study of preschool chil- 
‘dren there were no sex differences in integra- 
tion scores. 


The kindergarten boys were found to be 


‘more dominating than the kindergarten girls, 


these findings constituting a reversal of sex 
differences obtained in the previous data on 
preschool children. 


Notwithstanding the reversal in sex differ- 
ences in domination scores the data obtained 
from the kindergarten children gave only con- 
sistent support to the hypotheses advanced for 
the previous study, that: 


1. Domination in one child incites domina- 
tive techniques in the companion. 


2. Integrative behavior induces cooperative 
or integrative behavior in the companion. 


3. Domination as a psychological technique 
\. of behaving is not only different from, 
~ but where a potential avenue of escape 
is left open, it is dynamically unrelated 

to integrative behavior. 


Domination and integration scores in the 
behavior of the kindergarten children showed 


December, 1939] 


either very low or zero relationships with 
chronological age and mental age. 


REFERENCES 
1, Anderson, Harold H. An _ experimental 
study of dominative and integrative be- 
havior in children of preschool age. Jour- 
nal of Social Psychology, Vol. 8, No. 3, 

August 1937, PP- 335-345. 
_——. Domination and integration in the 
social behavior of young children in an 
experimental play situation. Genetic Psy- 


to 


DOMINATION AND SOCIAL INTEGRATION 


4. 


131 


chology Monographs, Vol. 19, No. 3, 
August 1937, pp. 341-408. 


. The measurement of domination 
and of socially integrative behavior in 
teachers’ contacts with children. Child 
Development, Vol. 10, No. 2, June 1939, 
pp. 73-89. 

——. Domination and social. integration 
in the behavior of kindergarten children 
and teachers. Genetic Psychology Mono- 
graphs, August 1939, Vol. 21, pp. 287-385. 


ELEMENTS IN SCIENCE BOOKS THAT APPEAL TO CHILDREN* 


AticeE M. WILLIAMS 
State Normal School, Potsdam, N. Y. 


Of the books that parents, teachers, libra- 
rians and other adults buy for children, some 
delight the recipients and become favorites, 
others are more or less politely received and 
thereafter ignored or even more emphatically 
rejected. A few adults have felt competent to 
determine the books which a child should like 
and to condemn the taste of the child who did 
not agree with their dictates. But most adults 
have had a desire to provide for children 
books which they will honestly enjoy and like. 
Much work has been done in the field of fic- 
tion to help adults solve this problem. Very 
little work has been done in the field of fac- 
tual material. Yet there is a wide range of 
factual material that is of interest to children 
and some children even prefer it to fiction. 
The author is especially interested in the field 
of science and feels that many children share 
her interest. For these reasons she undertook 
this study in an effort to discover some ele- 
ments of a book in the field of science that 
appeal to children. 


GENERAL PROCEDURES 


Three different attacks were made upon the 
problem. First the author studied the circu- 
lation of science books in the children’s de- 
partment of nine different libraries. Second, 
she observed a group of children using a se- 
lected group of science books in any way they 
wished. Third, she had a personal interview 
with each of these children regarding the 
books which they had used. 


LrBRARY CIRCULATION OF SCIENCE Books 


Five of the nine libraries used in the first 
part of the study were located in New York 
City. The other four were located in villages 
or small cities of New York State. The circu- 
lation of each book in the science division was 
checked for the period of one year. The books 
in each library were then arranged on a five 
point scale according to the number of times 
they had circulated during the year. Four 


* This paper gives a resumé of some of the findings reported 
in a Mon ph by the author: Children’s Choices in Science 
Books: A Study to Discover Some Elements of a Book in the 
Field of Science That Appeal to Children. Child Development 
Monographs, 1939, No. 26. Bureau of Publications, Teachers 
College, Columbia University. 


hundred forty-five books were found in two 
or more different libraries. A special study 
was made of these books. 

There was considerable agreement between 
the circulation of the same book in different 
libraries. When grouped according to a 
five-point scale, one hundred forty-eight of 
the 445 books were within the same step on 
the scale of circulation in every library in 
which they were found. One hundred thirty- 
five were but one step apart. This fact has 
added significance when it is realized that in 
each of the libraries these books competed 
with different books for the attention of the 
child patron. 

There was no apparent difference between 
the circulation of a book in the city and vil- 
lage and libraries. In fact the closest agree- 
ment between circulation of given books in 
two different libraries was found between a 
village and a city library, while the greatest 
difference was found between two village 
libraries. 

A comparative study was made of the 
books found in the highest fifth of circulation 
in every library in which they were found and 
other books found in the lowest fifth of cir- 
culation in every library in which they were 
found. There was no obvious difference in the 
physical make-up of the books in the two 
different groups. There were bright and du! 
covered books in each group; there were long 
and short books in each group; there were 
books with many illustrations and books with 
few illustrations in each group. However, 
there was a noticeable difference in the treat- 
ment of subject matter in the books of the 
two groups. The books in the highest group 
were rich in factual matter. Every paragraph 
helped the reader in understanding the sub- 
ject. There was no apparent effort to interest 
the reader in the content of the book through 
the use of extraneous material. The books in 
the lowest groups, on the other hand, showed 
a marked effort on the part of the author to 
motivate the reader. The authors discussed 
the importance of their subject, and inserted 
extraneous exhortations concerning the won 
ders of nature and the marvels of science. 


132 


December, 1939] 


These books of low circulation also showed a 
tendency to moralize: The bee workers were 
held up as models of virtue, the drones as 
“ne'er do wells.” Animals, plants and even 
inanimate objects were considered good in as 
far as they are beneficial to mankind and 
wicked in as far as they interfere with man’s 
welfare. 


The books were then divided into a pop- 
ular, middle and least used group according 
to their average circulation in the nine libra- 
ries. The books in each of these groups were 
then compared with respect to subject matter, 
cover of the book, number of pages in the 
book, type of illustration used in the book 
and the proportion of the book given to illus- 
trations. 


Books dealing with aviation and books giv- 
ing experiments in physics and chemistry 
were especially popular in the libraries. Of 
the thirty-five books dealing with aviation, 
twenty were in the popular group and eight 
in the middle group. Of the twenty-nine 
books dealing with physics and chemistry, 
sixteen were in the popular group and eight 
in the middle group. Books dealing with 
astronomy, animals and machines were also 
popular in the libraries. 


There was no noticeable difference in the 
circulation of books with respect to the cover 
of the book. It is possible that the same beok 
in two different covers would circulate differ- 
ently, or that children will be more particular 
about the cover of a book they are to own 
than one they are just borrowing from the 
library. Also the cover of a book may affect 
its circulation when it is first put on the 
library shelves, but there was no evidence in 
this study that the cover of a book made an 
appreciable difference in its circulation over 
the period of one year. 


On the whole, books with fewer pages cir- 
culated more frequently than longer books. 
This may be due to the fact that it takes 
longer to read a longer book. The fact that 
of the seventeen books with over 400 pages, 
seven were in the highest group of circulation 
and eight in the middle group, while of the 
eighty-two books with less than one hundred 
pages, forty-six were in the popular group 
and twenty-one in the middle group would 
seem to indicate that factors other than 
—_ are affecting the circulation of these 


SCIENCE BOOKS 


133 


With respect to illustrations, books with 
colored illustrations seemed to be the most 
popular, while those with photographic illus- 
trations were a close second. Books with 
other types of illustrations seemed to circu- 
late about equally. However, it was notice- 
able that clear cut illustrations dominated in 
the popular group. There was a more notice- 
able difference with respect to the proportion 
of the book given to illustrations. Half of the 
books with an illustration on every other page 
were in the popular group. However, thirty- 
six per cent of such books were also in the 
least used group. These books seemed to ap- 
peal chiefly as picture books. Again the clear 
cut illustrations were the popular ones. There 
was no apparent difference in the circulation 
of books giving other proportions of the books 
to illustration. 


A brief survey of the above data would 
seem to indicate that: books dealing with avi- 
ation, physics, chemistry, animals and astron- 
omy are more popular than other books; color 
of the cover, and the number of pages are 
negligible factors; colored and photographic 
illustrations are preferred to other types of 
illustrations; books with many illustrations 
are preferred to books with few illustrations. 


OBSERVATIONS OF CHILDREN’S RESPONSES TO 
Various SCIENCE Books 


The foregoing procedure represents an in- 
direct method of approaching the question. 
The adult examiner may have missed the cru- 
cial features of the book from the child’s 
point of view. To ascertain the elements in 
the books that attracted or repelled children, 
ninety-six children were observed while using 
a selected group of books. The children were 
in the fifth and sixth grades of the School of 
Practice of the State Normal School, located 
at Potsdam, New York. The thirty-five books 
were chosen from the larger list of books cov- 
ered in the first part of the study. The lim- 
ited list represented each of the fields of sci- 
ence and each of the circulation groups dis- 
cussed above. Table I gives the list of books 
used in the study and their group according 
to library circulation. Each of these books 
was graded by the Washburne and Vogel 
method of determining the grade difficulty of 
a book and on the Bamberger Score Card for 
Evaluating the Various Factors in the Physi- 
cal Make-up of a Book for Primary Children. 
Each of the children was given the National 


134 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


TABLE I 


Books USED IN THE Stupy WITH CHILDREN AND THE GROUP OF LIBRARY CIRCULATION 
IN Wuicu EaAcu Book Was FouND 


*Group in 
ibrary 
Author Title Circulation 
Te American Boys Book of Wild Animals ~-------- L 
Animal Book for Children M 
Experimental Chemistry P 
Fishes and Other Sea Animals M 
Birds Every Child Should Know M 
Hawksworth —_..---.----------- Strange Adventures of a Pebble _____-_-------_- L 
Parade of the Animal Kingdom P 
Children of the Tide L 
Strange Fishes and Their Strange Neighbors__-_ P 
Book of Birds for Young People P 
First Electrical Book for Boys P 
Automobiles from Start to Finish P 
Rogers and Beard .............. Heels, Wheels and Wires ~._____.._____-__-_-_ M 
Rush and Winslow ___-.._------- Modern Alladins and Their Magic _____-_---_-- L 


* P—popular group; M—middle group; L—least used group. 


Intelligence Test Form A and the Detroit 
Reading Test Form A. The children ranged 
in mental age from nine years and two 
months to seventeen years and four months, 
in reading score from 4B to 9A. Among the 
parents of the children were farmers, day 
laborers, business men, professional people 
and people on relief. There were children who 
liked to read and children who did not like 
to read. Most of the children habitually en- 
gaged in some form of out-door activity. 


For the observation part of the study, the 
thirty-five books were placed on a long table 
in a room. The children came to the room in 
groups of fifteen to twenty. The investigator 
told the children that she was a science 
teacher; these books were science books; she 
wished to know what boys and girls liked in 
science books. She would like to have the 


children look at these science books. She 
would like to have them choose some books 
they would like to read or look at more 
closely. They could take such books to a table 
or chair and use as they wished. 


The children circled around the table and 
looked at the books as they lay there. It was 
interesting to note that each child in time 
thumbed through the pages of all the books. 
Yet at the end of about ten minutes they had 
each decided upon a book to use. Enough 
copies of each book were provided so that 
each child could have every book he desired 
sometime during the study. On each of the 
following periods the books were on the table 
when the children entered the room. They 
took any book they liked and used it as they 
wished. There were two other long tables in 
the room and two small square ones. 


December, 1939] 


were enough chairs for each child to have one. 
The children sat where they wished, some 
even ignoring the chairs and standing by the 
windows to read. The small tables were espe- 
cially popular with children who wished to 
use one copy of a book together. 


While the children were using the books, 
the observer stood or sat at the back or side 
of the room with a magazine and a secretary’s 
notebook. She noted each overt response to a 
hook and the name of the child making the 
response. When a child discussed the text or 
Ilystrations of a book with another child she 
recorded verbatim their remarks and ques- 
tions. Each five minutes the time was indi- 
cated in the notes so as to give an indication 
of the time spent on the various books or dis- 
cussions and the time spent reading between 
such discussions. The observer’s watch had a 
second hand so she could time individual re- 
actions within half a minute. The results of 
the observer were checked twice by a second 
observer. 


The children were accustomed to a fairly 
free set-up in the school room and to the 
presence of adults engaged on other work in 
the room. After the. observer had explained 
what she wished them to do for her, they gave 
her very little attention. They would ap- 
proach her for the pronunciation of a word, 
the meaning of a phrase, for the explanation 
of an illustration, to ask if a certain state- 
ment were true or not, to have her settle an 
argument between two or more of them, to 
show her particularly attractive illustrations 
or read her choice selections from the text. 
Only one child showed any interest in her 
activity or consciousness of it. He was a fifth 
grade boy who asked what she read in the 
magazine and what she wrote from it. He took 
the magazine, which happened to be a current 
issue of The Nation, thumbed through the 
pages and asked if there were any science in 
it. He then thumbed through a few more 
pages, sighed and returned to his science 
book. The children seemed to be of the opin- 
ion that the observer was busy with work of 
her own while they looked at books. Since it 
is a fairly common practice for adults to do 
other work while the children have a free 
reading period, it seems reasonable to accept 
the children’s apparent reaction. 

At the end of the period the reactions of 
the children to the various books were com- 
piled. From these specific reactions the author 


SCIENCE BOOKS 


135 


could learn those elements of a book to which 
the children made an overt response. She 
could see which illustrations caused them to 
laugh; what facts they shared with other 
children; what books they read alone; what 
books they ignored. She could see those ele- 
ments to which they gave overt disapproval 
and those factors in terms of which they 
recommended books to each other. 


INTERVIEWS WITH CHILDREN 


These observable reactions represent only a 
part of a child’s response to a book. They 
were significant because they were spontane- 
ous. But the child may have liked elements 
he did not discuss with other children. Some 
children are more social in all their reactions 
than others. Some children are more inclined 
to try to influence the choices of others. Why 
did the children pore over one illustration and 
skim quickly past another? Did a child prefer 
the book he recommended to another child 
to a different book with which he spent an 
even longer period of time? To find what the 
children themselves believed to be the attrac- 
tive features of a book, a personal interview 
was held with each of the children. For this 
interview one copy of each book was placed 
on a table. The child took from these books 
those which he knew and had used. No effort 
was made to have a child discuss any book 
which he did not choose. The child was then 
asked to arrange the books in the order in 
which he liked them. Most of the children 
spent quite a bit of time in arranging the 
books. Usually the most favored three or 
four books were placed in position very 
quickly. The children were much slower in 
arranging books at the lower end of the list. 
They usually thumbed through the pages of 
each of these books and read a bit here and 
there, as though they wished to recall the 
books more clearly to mind. It would seem 
that they remembered better those books 
which they liked. 

The interviewer listed the books in the 
order in which the child arranged them. Be- 
ginning with the best liked book the child was 
then asked to state what he liked or disliked 
about each book. Most of the children gave 
definite and concise answers to this question, 
but a few merely answered, in: general terms, 
to the effect that the book was interesting. 
To these children, the interviewer said, “Yes, 
but I want to know what makes it interesting 


136 


to you. Another boy or girl might not like 
this book.” A few of the children began to 
recount the material in the book. To these 
children the interviewer said, “Yes, that is 
what the book is about, but I want to know 
why you like the book. What is there in or 
about this book that you like?” In answer to 
these questions most of the children gave defi- 
nite answers. In case a child was still not 
specific about any book, no further effort was 
made to cause him to give a specific state- 
ment. 

None of the children was as articulate in 
stating reasons for disliking a book as in giv- 
ing reasons for liking a book. Many of the 
children simply said, “I don’t know why. I 
just don’t like it.” or “I don’t know. I just 
didn’t get interested in it.” Several children 
did, however, give specific reasons for dislik- 
ing special books. The interviewer took down 
the responses of the children verbatim. 


After the last book had been discussed in 
this manner, the child was asked to arrange 
the books in the order in which he liked the 
illustrations. The interviewer listed the books 
in this order. The child was then asked to 
state what he liked or disliked in the illus- 
trations and his responses were recorded ver- 
batim. These responses were compiled accord- 
ing to books. A study of these responses and 
of the overt reactions of the children to the 
book revealed some factors of a book in the 
field of science which attract or repel children. 

Judging from both their overt and verbal 
reactions, the subject matter of a book is of 
crucial importance to children. They men- 
tioned it in recommending a book to other 
children. They asked the observer if she had 
any books on ‘special subjects that interested 
them. In discussing the book in the interview 
their first response usually referred to the sub- 
ject of the book. 

One girl responded about one of the animal 
books, “Why its about animals.” 

Then she added slowly, “I guess I’d like 
any book about animals.” 

A fifth grade boy remarked in discussing 
a book about automobiles, “I love automo- 
biles. I love new style automobiles and first 
automobiles. I love racing automobiles and all 
kinds of automobiles.” 

Boy after boy remarked that he liked air- 
planes or was interested in aviation. . 


JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 8, No. 2 


Different children have different interests. 
But most children have many interests and 
many subjects appeal to most children. 

Animal books were popular with most 
though not all of these children. They pored 
over animal books, shared the illustrations 
and the factual content with each other, and 
even supplemented the material in some of the 
books with experiences of their own, during 
the observation part of the study. In dis. 
cussing the animal books in the interview, the 
fact that it was about animals or about a spe- 
cific animal, such as a frog, beaver or fish. 
was given as a reason for liking the book. 

Mammals and snakes were the most popular 
of the animals, but birds, fish, bees and pre- 
historic animals had their adherents, and one 
sixth grade girl said that Protozoa were the 
most interesting. This girl, during the obser- 
vation part of the study, had read the entire 
chapter of Hegner’s Parade of the Animal 
Kingdom that was given to Protozoa, had 
pored over the illustrations and had gone 
back to the section several times, so her 
observed reactions agreed with this statement. 

One girl remarked that she liked “big ani- 
mals the best, especially those that kill 
people.” 

Judging from the time spent on these ani- 
mals during the observation part of the study 
and from their comments as they talked with 
each other, many children shared this in- 
terest. 

Animal books that told how the animals 
lived, how they procured their food, cared 
for their young and protected themselves from 
their enemies were especially popular with 
these children. 

One sixth grade boy remarked in discussing 
Hawk’s Shaggycoat, “It was swell. It told 
how they did things every day.” 

In referring to Gall and Crew’s Wagtail, a 
fifth grade girl said, “It told more about frogs 
than any book I ever read.” 

This information was valued especially high 
if it dealt with a phase of the subject with 
which the child was previously unfamiliar. 

Again and again children rated books high 
because “It told me so many things I didn't 
know.” 

Usually a specific fact was mentioned as a 
illustration of this characteristic of the book: 
for instance the fifth grade boy who liked 
Kenly’s Green Magic because, “It told about 


3 


December, 1939] 


plants that look like animals. I didn’t even 
know there were such.” 

The negative side of this same appeal is 
found in several children who rejected Patch’s 
Holiday Hill because in the words of a sixth 
grade girl, “It didn’t tell enough about snakes 
or plants or anything.” 

A sixth grade boy expressed the same 
opinion and then added, “I think it was 
written for girls.” 

One boy remarked that Morse’s Creepers 
and Sliders was “too darn short.” 

Strange animals were very popular. Maybe 
they aroused an interest that makes the 
animals rival the clown in popularity in 
a circus parade or sends hundreds of 
children each holiday to the various zoos 
throughout the country. However that may 
be, these animals were popular in every book 
in which they appeared. The illustrations and 
the facts concerning them were shared and 
discussed during the observation part of the 
study and they were pointed out to the in- 
vestigator during the interview. 

A fifth grade girl finished her discussion of 
Ditmar’s Prehistoric Animals by saying, “It’s 
a strange book. I never read anything quite 
like it before. I like it. I like it better than 
common books. It is more interesting.”’ 

Another fifth grade girl expressed her 
pleasure in the same book and said, “It had 
such queer animals. Some I never dreamed of 
before, not even at night.” 

On the other hand, pets also made a strong 
appeal to these children. Child after child 
pored over the illustrations of such animals. 
They described various pets they had owned 
or known to each other. They discussed the 
merits of various animals as pets, ranging 
irom snakes to monkeys. During the inter- 
views similar reactions were noticed. 

The classification of the animal kingdom 
also intrigued these children. Every child dis- 
cussing Burgess’ Animal Book for Children in 
the interview referred to this information in 
some way. During the observation part of the 
study they discussed the various related ani- 
mals with each other. 

During this part of the study one fifth 
grade girl called the observer to her and 
asked, “Are owls and eagles the same thing?” 

_The observer replied that they were two 
different birds and turned to get a book show- 
ing illustrations of each. 


SCIENCE BOOKS 


137 


But the child continued, “Yes I know that. 
But do they belong together? Are they re- 
lated like this says some of the animals are?” 

Books dealing with airplanes, automobiles, 
trains and wheels, especially those which told 
how the machine was made or how it worked, 
were popular with these children. They 
showed the illustrations in these books to 
each other. They discussed the workings of 
the various machines and compared their 
various advantages. 

During the interview one fifth grade girl 
said about Pryor’s Trainbook, “It tells how 
trains are run. It tells about the people riding 
on the train too, but mostly how it runs and 
that’s why I like it.” 

In the interviews, the assembly line was 
declared the best part of Lent’s Wide Road 
Ahead. The car being pushed over the cliff as 
part of the testing program was also pointed 
out. Much time and attention was given both 
these sections during the observation part of 
the study. When one child read aloud that a 
car was made in fifty minutes a lively dis- 
cussion followed. 

Finally one boy seemed to crystallize the 
thinking of the group by saying, “They 
couldn’t make a car in fifty minutes if they 
didn’t have everything ready.” 

Aviation was especially popular with the 
boys, but none of the girls referred to aviation 
books in the interview. During the observa- 
tion period, a few girls tried to read Mingos’ 
Flying for 1937, but they seemed to lack the 
necessary background for understanding the 
book. Judging from their reactions, there is 
need for a simple book on aviation that will 
explain the mechanical working of an airplane 
to a girl who does not know an amphibian 
plane from a hydroplane, nor any of the other 
seemingly obvious facts and terms. 

Seven of the boys wished to be aviators. 
They were delighted with the information 
given concerning the training of pilots. As was 
noted concerning the animal books, it was the 
facts they had not known before that espe- 
cially interested the children. One girl in dis- 
cussing Pryor’s Trainbook worded it, “There 
were some things in the book I didn’t know. 
I liked that part. When I read the part I 
already knew like getting on trains and the 
things that are on trains, I didn’t care about 
ly 

Strange or unusual machines such as the 
rollers of the Egyptians or the Chinese Wheel- 


138 JOURNAL OF EXPERIMENTAL EDUCATION 


barrows (Petersham’s Storybook of Wheels) 
delighted many children. 

The historical development of machines 
from the simplest to the most complex inter- 
ested these children. This fact was mentioned 
in the interview in discussing every book that 
traced this development. It was mentioned 
even in reference to Lent’s Wide Road Ahead 
and Reck’s Automobiles From Start to Finish, 
which give but a small proportion of their 
contents to this subject. 

In discussing Petersham’s Storybook of 
Wheels, one boy expressed this interest in the 
words, “I like to know how the first wheels 
were made, how the early wagons were made 
and how they changed up to the present.” 

Numbers and size interested many children. 
They seemed to make a definite effort to visu- 
alize unusually large or small things. One 
group of fifth graders discussed the number 
of cars in the parking place at Jones’ Beach. 
They reached the verdict that there were 
more cars in that picture than children in 
their school but not as many as people in 
their town. Two of the boys endeavored to 
count the cars, but neither finished the task. 
The huge press for stamping out the steel 
bodies of cars (Reck’s Automobiles From 
Start to Finish) caught the attention of the 
children. Four boys changed the 750 tons to 
pounds to see how much that “really was”: 
The size of the dinosaurs was estimated in 
terms of school rooms, that of the hippo- 
potamus in terms of blackboard slates. 

Efforts to motivate the reader by talking 
about the subject or discussing the wonders 
of science or the glories of nature were uni- 
formly rejected by these children. Facts pre- 
sented in simple, direct form were much pre- 
ferred. The ehildren seemed to have a real 
interest in science and efforts to dilute it were 
frowned upon. 

One boy said in discussing one such book, 
“You have to read too much to find out what 
you want to know.” 

Efforts to make a book more attractive 
through the use of personification were 
ignored or disapproved of by these children. 

In discussing Gall and Crew’s Wagtail a 
fifth grade girl remarked “It was silly to have 
the frogs talk.” 

During the observation part of the study, 
a fifth grade boy called the observer and 
showed her an illustration of a garter snake 
called “Sir Talis”. 


[Vol. 8, No. 2 


“Ts that a garter snake?” he asked. 

On being answered in the affirmative he 
said, “Then why didn’t they say so?” 

None of the children commented favorably 
upon this device. 

The use of a story in the presentation of 
factual material received two different re- 
sponses. A story which had adventure and 
convincing characters was liked. One which 
was merely a chronical of events or a frame- 
work for presenting facts was ignored if the 
story was so insignificant that the child could 
gain the factual content without becoming to 
conscious of the story element. If the story 
took much space in the text, or if the char- 
acters interrupted explanations with questions 
of their own or in any other way interrupted 
the presentation of factual material these chil- 
dren objected to the story. 

Adventure in itself was attractive to the 
children. Many children remarked they liked 
a book because it was exciting or because the 
animal or people had adventures. Even more 
children told certain exciting incidents from 
the text as examples of the book’s appeal. 
Even books which were not narratives, such 
as Washburne’s Story of Earth and Sky and 
Roger and Beard’s Heels, Wheels and Wires 
were described as adventurous. 

In discussing the former a sixth grade girl 
said, “Time they got near the sun they were 
all hot and ‘bothered. They could hardly 
breath and neither could I.” 

The device of having an adult or other 
character give information to a child character 
was ignored or disapproved of by these chil- 
dren. In books such as Pryor’s Trainbook ot 
Lent’s Wide Road Ahead, this feature of the 
book was minor enough to be ignored. When 
the characters intruded more noticeably into 
the text the children objected to them. 

In discussing such a book one sixth grade 
boy remarked that he liked “about the stars” 
and he liked the legends but he didn’t like 
“the characters”. 

Humor both in the text and in the illus 
trations was popular with these children. 
Play on words, odd looking animals or people, 
absurd machines or situations were funny t 
the children. There was a noticeable difier- 
ence in the things which children of different 
maturity considered funny. The more mature 
children were delighted with the rhyme of the 
great fleas that had little fleas and the descrip- 
tion of the kissing amoeba in Hegner’s Parad? 


December, 1939] 


of the Animal Kingdom; with the decision of 
the judge in Roger and Beard’s Heels, Wheels 
and Wires that the potter had a right to take 
clay from the road for his work. The cat who 
should diet and the pig with a clean face in 
Burns’ Animal Fair pleased the less mature 
children. Old fashined cars and the need “to 
get out and get under” (Lent’s Wide Road 
Ahead), the efforts of the Indians to stop a 
train with a rope, and Johnny Bear with the 
syrup can (Seton’s Lives of the Hunted), 
amused all of them. 

Books that told how to make or do things 
were popular. Morgan’s First Electrical Book 
for Boys was rated high because “It’s all 
about electricity. It shows you how to make 
nice things, like little gadgets and telegraph 
sets.’ Mathew’s Book of Birds for Young 
People and Doubleday’s Birds Every Child 
Should Know were recommended because 
“you could tell the birds when you see them.” 


However, such books are likely to be of 
value and prized by children only if the nec- 
essary materials with which to work are avail- 
able. Morgan’s First Electrical Book for Boys 
and Collins’ Experimental Chemistry were 
equally popular in the library circulation. 
The materials for the experiments of the first 
book were available to some of the children. 
These individuals were enthusiastic about the 
book. None of the children had access to the 
materials for the experiments in the chemistry 
book. Only two children (both girls) used 
this book. One seemed to try to work the 
experiments imaginatively, but she was not 
satisfied with this makeshift. The other copied 
some of the experiments and tried them at 
home, but “couldn’t get it very well”. 

The children leaped across the boundaries 
of grades with no apparent hesitation. One 
sixth grade girl remarked rather apologet- 
ically that Burn’s Animal Fair was too young 
for her but added emphatically that she liked 
it. One boy pronounced Burn’s Animal Fair, 
Petersham’s Storybook of Wheels and Pryor’s 
Trainbook babyish. He had been slow learn- 
ing to read and seemed still to be judging the 
| difficulty of a book by the number of pages 
and the lack of illustrations. The other chil- 
dren ignored this feature of a book entirely. 
Every child with a ninth grade reading score 
included some fourth grade books among his 
favorites. On the other hand, all of the chil- 
dren except those with ninth grade reading 
scores (there were no books rated above ninth 


SCIENCE BOOKS 


139 


grade in difficulty) included some books above 
their reading score among their favorites. 
Most of these children would try to read any 
book that interested them regardless of its 
vocabulary difficulty. A boy with a fifth grade 
reading score read Mingos Flying for 1937 
(ninth grade difficulty). He read very slowly, 
reread much of the material and asked for the 
meaning of many words but he read the book. 


In fact, difficulty of concepts that were pre- 
sented seemed to trouble these children more 
than difficulty of vocabulary as such. Reed’s 
Stars for Sam seemed too difficult for children 
with a mental age of less than twelve years 
or with only a fair background in astronomy, 
regardless of their reading scores. 

Illustrations seemed to be an integral part 
of the book to these children. They were not, 
however, regarded by all as indispensable. In 
fact, one boy remarked that a book could be 
good, “even if it didn’t have any pictures.” 
But if they were present they were a part of 
the book to which the child responded. Some 
of the books were chosen and became favor- 
ites because of their pictures. The children 
preferred realistic pictures. “They look so 
real,” was their most common recommenda- 
tion of such an illustration, or they might say, 
“They look so much like the animal (car or 
whatever the child was discussing) really 
looks.” They seemed to have more confidence 
in such illustrations. Pointing to an illustra- 
tion of a red fox, one boy said, “Now that 
really looks like a fox. Now I never saw this 
animal (pointing to an illustration of a 
hyena) but it probably looks like that.” 


Photographic illustrations were especially 
liked. One boy said, “Photographs are the 
best,” and another, “They look as though 
they had been taken not as though they had 
been drawn,” while a third boy complimented 
the illustrations in Seton’s Lives of the 
Hunted by saying, “They are most as good 
as photographs.” 

However it was only the best of the photo- 
graphic work that appealed to these children. 
“Just pictures” were promptly dismissed as 
such. The animal should be alert, preferably 
doing something. The parts of the car must 
be clear and distinct. The pictures were pre- 
ferred because they were real and the young 
reader wanted the details. For example, 
Kearney’s Strange Fishes and Their Strange 
Neighbors was commended because “you can 


140 


see the stripes, and the spots, and the fins, 
things you can’t see in lots of books.” 

The essential requirement seemed to be 
that the illustration conveyed meaning to the 
reader. Such pictures were popular regardless 
of the medium used. On the whole these chil- 
dren did not like line drawings. They referred 
to them as “just sketches”. One fifth grade 
girl remarked, “I like to watch people make 
sketches. But I don’t like them in a book.” 
This seemed to be the general attitude of all 
the children. But the line illustrations in 
Morse’s Creepers and Sliders were very pop- 
ular because “You can get the story from the 
pictures.” 

Remarks such as, “You couldn’t get it so 
good from just the reading;” “The pictures 
show you just what he means;” “I learned a 
lot from these pictures;”’ showed the tendency 
of these children to use the illustrations as a 
source of information. They seemed to feel 
the photographic ones were the most reliable. 
One boy said “A photograph shows you how 
it really is. An artist might paint it different.” 
Yet any picture that conveyed meaning to 
them was accepted, while one sixth grade girl 
rejected a book because “I couldn't under- 
stand it, not even the pictures.” 

The children liked pictures of strange or 
unusual things, such as prehistoric anirhals, 
or animals of foreign climes, machines and 
customs of foreign lands. One boy said in 
discussing the illustrations of Petersham’s 
Storybook of Wheels, “I like the pictures of 
things you don’t usually see.” 

In the same manner charts, diagrams and 
maps were popular. The children studied 
them at length during the observation part 
of the study and commented on them in the 
interview. However, they must be under- 
standable and usable from the childs point of 
view. This aspect of the subject was con- 
stantly stressed in the interviews. One boy 
remarked about Morgan’s First Electrical 
Book for Boys, “The pictures tell you how 
to fix things. You couldn’t make batteries or 
anything without the pictures.” 

Color was liked especially if it added to 
the information conveyed by the picture. 
This was especially true with respect to the 
bird books. “You needed the colors to see 
what the bird really looked like.” On the 
other hand, most of the children announced 
that the photographs were better not colored 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


because, “if it were colored you would lose 
some of the details.” 

These children seemed to feel that the illys- 
trations and the text should supplement each 
other. This point was illustrated by such re. 
marks as: “The explanations help you under- 
stand the pictures;” “The pictures and read- 
ing together give you a good idea;” “You can 
get a lot from just the pictures, but more jj 
you read the reading too;” “The pictures and 
the reading go right together.”’ 


A Stupy or ADULT JUDGMENTS CONCERNING 
CHILDREN’S SCIENCE Books 


For the purpose of discovering adult re- 
actions to the thirty-five books used in the 
study, 131 normal school students of the Pots. 
dam State Normal School were asked to rate 
each book, “A” if they thought it would be 
enthusiastically liked by children, “B” if they 
thought children would like the book very 
much, “C” if they thought the book would 
be liked only fairly well by children, and “D” 
if they thought children would be indifferent 
to it or even dislike it. They were also at 
another time asked to indicate their own re- 
action to the book in the same manner. One 
group of the adults indicated their opinion of 
children’s reactions first and their own reac- 
tion second. The other group reversed this 
order. There was no apparent difference be- 
tween the responses of the two groups. 

As has been noticed in previous work along 
this line, adult opinion is not a good indica- 
tion of children’s taste. Hegner’s Parade 0 
the Animal Kingdom was one of the most 
popular books used in the study. Yet only 20 
percent of the adults felt this book would ap- 
peal to children. Some of the adults were even 
quite emphatic in their opinion that children 
would not care for it. Adults do not seem to 
be sufficiently critical in judging childrens 
books. Some books that fell in the middle 
group judged by any one of the three meth- 
ods used in studying children’s reactions were 
rated “A” or “B” bv over 75 percent of the 
adults. There were eight books in the study 
which judged by any one of these three met)- 
ods fell in the least used or liked group. Ye! 
every book was rated either “A” or “5” by 
at least 27 percent of the adults. 

The adults seemed to take the presence © 
a story as a factor that would insure chil- 
dren’s liking and they seemed to be more 
affected by the format of a book than were 


December, 1939] SCIENCE BOOKS sas 


the children. Of the three books rated “A”’ or 
“B” by over 90 percent of the adults, two 
have colored illustrations and one has photo- 
graphic illustrations. The pictures in all three 
are appealing and were rated high by both 
children and adults. Seven of the ten books 
rated “A” or “B” by 75 percent or more of 
the adults had bright colored covers. Both 
Hegner’s Parade of the Animal Kingdom and 
Mingos’ Flying for 1937 which were rated 
high by children and low by adults had more 
subdued colored covers. On the whole, the 
twelve books rated “A” or “B” by 68 percent 
or more of the adults are more attractive in 
format than the other books. The covers are 
bright, the type large and the illustrations 
clear. 

On the whole, there was a better agreement 
between the books which adults themselves 
liked and those children liked than between 
those they thought children would like and 
the ones children liked. Seventy-three percent 
of the adults liked Hegner’s Parade of the 
Animal Kingdom as contrasted with the 29 
percent who thought children would like it. 
On the other hand the narratives that were 
popular with children were not so with the 
adults. 


COMPARISON OF PUBLISHED Book REVIEWS 
AND CHILDREN’S RESPONSES 


A further study of adult reaction to these 
thirty-five books was made through the pub- 
lished reviews of the books. One hundred 
thirty-two book reviews were considered. 
Each of the books reviewed was considered 
good by the reviewer. Since he was discussing 
hooks for prospective customers, books which 
the reviewer did not consider good were not 
discussed. In this study, the adult judges 
fared no better than in the study just de- 
scribed. Many of the reviewers remarked that 
the book would interest children, that it was 
attractive or even fascinating. Among the 
books so described were books found in the 
least used and least liked group by each of 
the three methods used in studying children’s 
reactions. 


The reviewer usually indicated something 
about the subject of the book. However he 
gave little if any attention to the develop- 
ment of the subject or to how it was treated. 
Only seven of the 132 reviews gave a picture 
or idea of the subject and content of the book 
‘sa whole. Yet this aspect of the book was 


most important to the young readers whose 
behavior was studied in this investigation. 


On the whole, the illustrations received 
but very little attention from the reviewers. 
Some merely stated that the book was illus- 
trated. Most of these indicated the type illus- 
tration used. But only a few described the 
content of the illustrations; very, very few 
indicated that the child could get information 
from the illustrations, and none discussed the 
type of information thus to be secured. Yet 
this feature of illustrations was important to 
children. 


Personification and the use of a child char- 
acter to whom the scientific information of 
the book was explained were frequently men- 
tioned as attractive features of the book re- 
viewed. Yet the children of this study indi- 
cated a dislike for both these devices. 


Eight of the reviewers remarked that a 
book was good for a boy. Among these books 
were Reed’s Stars for Sam, Reck’s Automo- 
biles From Start to Finish and Lent’s Wide 
Road Ahead, All of these books appealed to 
the girls as well as the boys in this study. 


In short, the reviewers of these books 
seemed to consider those elements that ap- 
peal to the adult purchaser rather than those 
which appeal to the child reader. 


SUMMARY 


The results of this study indicate that in 
the field of science literature, children like 
books that are rich in factual content; books 
that present material that is new to the 
reader; books that tell of the habits, adapta- 
tion and peculiarities of animals; books that 
tell how machines are made and work; stories 
that are strong and vigorous; stories in which 
the characters have convincing personalities; 
and books that tell how to make and do 
things when the materials necessary for the 
use of such books are available to the chil- 
dren. The use of personification, the practice 
of motivating the reader by discussing the 
importance or wonders of the subject, the use 
of a character to ask questions and to receive 
the information which the author wishes to 
present are at best questionable devices for 
securing a child’s interest and may prove 
detrimental. 


Illustrations are treated as an integral part 
of the book. They are valuable when they 


142 


convey information or meaning to the reader. 
Pictures which failed to do this were rejected. 
Realistic illustrations were preferred and pho- 
tographic illustrations were the most popular 
of these. Colored pictures were preferred if 
the color added to the information conveyed 
by the illustration. Charts, maps and dia- 
grams were liked if they were simple enough 
and clear enough so that the children could 
understand and use them. 

On the whole, adults when judging a book 
for children seemed to be affected by the 
format of the book and devices to make the 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. » 


content attractive rather than by the conten 
itself. Children, in choosing books and react. 
ing to them later, seemed to be more affected 
by the actual content of the book. Published 
reviews gave a relatively poor indication of a 
book’s actual appeal to children or of the fea- 
tures of the book that appeal most to chil- 
dren. Adults’ ratings of books on the basis of 
their own liking for them showed higher 
agreement with children’s interests and rat- 
ings than did adults’ ratings of books in terms 
of whether they thought children would like 
them. 


THE EFFECT ON PUPIL GROWTH OF AN INCREASE 
IN TEACHER’S UNDERSTANDING OF PUPIL BEHAVIOR 


RatpH H. OJEMANN AND FRANCES R. WILKINSON 
lowa Child Welfare Research Station, State University of lowa, lowa City, lowa 


It is clear that effective learning cannot 
take place unless a strong motive is present. 
It is equally clear that it is difficult to moti- 
vate a child whose energies are spent worry- 
ing about in-school or out-of-school situations 
or whose wants at the moment are of such a 
character that the work in school does not 
seem to contribute to his wants. If the teacher 
does not know that perhaps John is worrying 
about getting to his street corner on time to 
sell his papers, or that he does not take part 
in class discussions because of a feeling of 
inferiority, or that he is worried about his 
home, she is not likely to succeed in stimulat- 
ing John to do his best. On the other hand 
knowing the child’s attitude, conflicts, and 
purposes should make the teacher a better 
guide in planning an effective program of 
work for him. 


Furthermore, if the development of person- 
ality in its various aspects were closely 
watched by the teacher it would appear highly 
probable that the beginnings of behavior diffi- 
culties could be detected long before the 
difficulties emerge as serious behavior prob- 
lems. Teachers and administrators long ago 
learned that it is impossible to direct the 
child’s growth in school subjects without care- 
ful and frequent checks on the progress the 
pupil is making. Numerous formal and in- 
formal tests and other diagnostic devices are 
used by teachers and administrators to deter- 
mine growth in the subject matter areas. 
Teachers are trained to revise their teaching 
procedures in the light of pupils as deter- 
mined by such methods. 

In the area of personality growth, however, 
the situation is quite different. The present 
tendency in schools is to wait until some 
“maladjustment” or “behavior problem” ap- 
pears. Then, but usually not before, person- 
ality growth becomes a matter of concern. 
And it is the “problem child” that becomes 
the center of emphasis. This tendency to wait 
until the child gets into difficulty before giv- 
ing attention to personality growth appears 
analagous to waiting until the pupil has failed 


on a final examination before giving consid- 
eration to his growth in knowledge. Neither 
course of action is logical. It would seem that 
if the classroom teacher had at hand informa- 
tion about the child’s personality and were 
trained to follow the development of person- 
ality in its several aspects, just as she follows 
the course of growth in reading, spelling, or 
history she could detect the beginnings of 
behavior problems and redirect development 
long before the deviations become serious. 
Thinking only in terms of problem children 
is not adequate for effective guidance. 

The foregoing suggestions, namely, that 
learning becomes more effective and that the 
development of personality can be more ade- 
quately controlled if a careful analysis of 
behavior is made by the teacher, are in the 
nature of assumptions. The study reported in 
this paper was designed to test experimentally 
the two assumptions. 

The general method of the study consisted 
in selecting an experimental and control 
group, obtaining measurements of both groups 
at the beginning of the experimental period, 
assisting teachers in making an analysis of 
the motives, attitudes, and environmental 
conditions of the experimental subjects, and 
measuring growth again at the close of the 
experimental period. Underlying the plan of 
the study is the conception that behavior is 
determined by such factors as motives, psy- 
chological equipment in the form of attitude, 
emotional control, etc., and the presence or 
absence of direct restrictions in the environ- 
ment. It was assumed that to chart the 
growth of personality data relative to the 
child’s ambitions or motives, attitudes, emo- 
tional stability, and the nature of the home 
and community environment would be needed. 

From approximately one hundred thirty- 
five pupils of the ninth grade of a public 
school sixty-six subjects for whom various rec- 
ords required in the equating procedure were 
available were selected and divided into an 
experimental and control group of thirty-three 
subjects each. The two groups were equated 


143 


144 


Factor 
Chronological age, years 
Achievement, previous year 


in terms of chronological age, scores on Otis 
Group Intelligence Test, and achievement of 
previous year as measured in grade points. 
As indicated in the following tabulation the 
two groups were rather closely matched in all 
three factors. 

The comparisons of the experimental and 
control groups were made in the following 
areas: school achievement, selected attitudes, 
personality conflicts, and certain ratings of 
pupil adjustment. School achivement was de- 
termined in terms of grade points. An attitude 
test including items relative to the pupil’s atti- 
tude toward his school, his teacher, his home, 
and toward himself was administered at the 
beginning and at the close of the experimental 
period. Personality conflicts were tested by a 
revision of Luria’s method.’ By the use of 
this method it seems possible to obtain fairly 
reliable indications of the extent of personal- 
ity conflicts, and some indication of their 
character and of the accompanying mental 
processes. The method has shown a very sat- 
isfactory reliability in a number of studies 
and also a satisfactory correlation with other 
indices of personality conflict.? 


In addition to the data on grade points, 
attitude, and personality conflicts, teachers’ 
ratings of the general adjustment of pupils at 
the beginning and at the end of the study 
were included. 

At the beginning of the experimental period 
personality and environmental data were ob- 
tained for the experimental subjects. These 
data were made available to the teachers who 
were given rather extended suggestions as to 
their meaning and use. From the attitude test 
such items as the nature of the child’s ambi- 
tions, his satisfaction or dissatisfaction with 
various aspects of his home environment, his 
attitude toward his companions, and the like 
were obtained. Through a carefully con- 
ducted interview with parents data were se- 
cured relative to the home environment of 


1 Luria, Aleksandr R.: The Nature of Human Conflicts: Or, 
Emotion, Conflict and Will: An Objective Study of Disorgan- 
ization and Control of Human Behavior. Trans. by W. H. 
Gantt. New York: Liveright, [c. 1932]. Pp. xvii, 431. 

2 Ackerley, Lois, Ojemann, Ralph H., Neil, Berniece, and 
Grant, Eva: A Study of the Transferable Elements in Inter- 
views with Parents. J. Exper. Educ., 1936-1937, 5, 137-174. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[ Vol. 8, No. 2 


Experimental Group Control Group 


Standard Standard 

Mean Deviation Mean Deviation 
14.4 .79 13.9 52 
109.7 7.90 108.3 9.09 
3.14 .82 3.15 62 


the child, the nature of the parental attitudes. 
and the nature of the child’s behavior a 
home. In carrying out the interview every 
attempt was made to build a co-operative 
relationship between the parent and the ip. 
vestigator. In general the parents were found 
to be very co-operative. The interviews varied 
in length from approximately one to tw 
hours. 

With the data for each experimenta! sub- 
ject at hand the investigator summarized the 
important facts in written form and added 
her interpretations of the situation. An ap- 
pointment was then made with each teacher 
who had the subject enrolled in her class. 
After a satisfactory working relationship had 
been established between teacher and investi- 
gator the analysis was presented, the investi- 
gator pointing out the essential facts and 
making sure that the teacher understood 
them. Suggestions that seemed helpful in un- 
derstanding and controlling the pupil's be- 
havior were supplied. It was assumed that the 
teachers would need help in interpreting child 
behavior and to this end the analyses and 
suggestions were made very complete and 
care was taken that the teacher developed a 
functional understanding of the pupils be- 
havior. Considerable use was made of alter- 
native suggestions. The teacher then pro- 
ceeded to apply her knowledge in planning the 
child’s daily work and in the conduct 0! her 
classes. 

The investigator made it a point to drop 
from time to time to discuss the pupils 
progress informally with the teachers. The 
co-operative relationship that had been built 
up was maintained throughout the study. In 
the investigator’s judgment excellent 
operation was given by the teachers. 

The experiment was begun in the fall ané 
continued through the school year. The nece* 
sary measurements were repeated in the 
spring just before the close of school. i" 

At the close of the experiment the (ite 
ence in mean grade points between the & 
perimental group and the cuntrol group bad 
risen to show a critical ratio of the mean “ 
its standard error of 3.43. An examination “! 


4 
a 


December, 1939] 


Group 
Experimental 
Control 
Difference 


the data in the tabulation below will give 
details. 

Thus the experimental group made a sig- 
nificantly greater academic gain. In interpret- 
ing this finding it should be pointed out that 
the teachers were not aware that academic 
achievement would be used as one measure of 
comparison. 

In the attitude test administered at the 
beginning and at the end of the experimental 
period ten items were included dealing with 
the subject’s relation to school and school 
work. These items were scattered throughout 
a test which in addition to the attitude to- 
ward school included various items relating 
to the home, the subject’s ambitions, and the 
like. The nature of the attitude test items is 
indicated by the following samples: 


The following directions were given to 
the subjects: 

“Your careful answers to the following 
questions will help us to help you with your 
school work. Your answers to the follow- 
ing statements will in no way affect your 
grades but they will help us to help you. 
Read each of the following statements 
carefully. If you feel that the statement is 
completely or almost completely true, put 
a circle around 1, like 1 234567. 

If you feel that it is probably true, or 
true in large degree, put a circle around 2 
Or 3. 

If you feel that it is quite undecided, an 
open question, put a circle around 4. 

If you feel that it is probably false or 
false in large degree, put a circle around 5 
or 0. 

If you feel that the statement is com- 
pletely or almost completely false, put a 
circle around 7, 

12345 67 Under ordinary circum- 
stances good marks are 
usually the result of hard 
studying. 

The use of lipstick and 
rouge, if not very notice- 
able, is permissible for 
girls of high school age. 


1234567 


PUPIL GROWTH 


145 
Standard Deviation 
Of Distri- Critical 
Mean bution Of Mean Ratio 
3.21 .63 10 
2.97 .68 11 
.24 .O7 3.48 


I 2 3 4 5 6 7 My school work is easier 
for me this year than it 
has been in the past. 


An attempt was made to conceal the pur- 
pose of the test by including items relating 
to many areas. It was planned to use only 
the items relating to school when comparing 
the growth of the experimental and control 
groups. Though its purpose was somewhat 
concealed the attitude test was not as indi- 
rect as was desired but was used pending the 
construction of a good attitude test in this 
area.* 

An analysis of the items relating to school 
revealed that by the close of the study the 
experimental and control groups differed 
significantly in their attitude toward school. 
The experimental group was significantly 
more willing to ascribe achievement to 
planned work rather than to chance factors. 
The experimental group felt that careful 
work in the long run would bring its re- 
ward and that an education can be made 
worthwhile. The experimental group felt sig- 
nificantly less the need of cheating. They 
were less willing to be swayed by temporary 
likes and dislikes in school life. They felt 
that their work during the experimental year 
had been more pleasant than that of the pre- 
ceding year. The experimental group also evi- 
denced a more favorable attitude toward their 
school companions and gave fewer indications 
of feelings of inferiority than did the control 
group. On the whole these data seem to indi- 
cate a happier and more logical attitude to- 
ward school and school work on the part of 
the experimental group than the control 
group. 

In the test for personality conflict several 
types of records were available. These in- 
cluded indications of conflict and types of 
verbal responses. As indications of conflict, 
disturbances in the voluntary movements as 
obtained in the Luria test were used. An anal- 
ysis of the disturbances shows a significant 
decrease from fall to spring on the part of 
the experimental group and a slight but not 


*Ojemann, Ralph H.: A Revised Method for the Measure- 
ment of Attitude. Psychol. Bull., 1937, 34, 752. 


146 JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 8, No. 2 


significant increase on the part of the control adjustments and consequently in their mental 
group. The critical ratio of the change in the _ life tend to be more occupied with personal 
experimental group is 3.45. Thus the experi- difficulties producing a more personalized type 
mental group gave a significant reduction in of response. The experimental group on the 
the scores on the voluntary disturbance por- other hand seems to be extending in the direc. 
tion of the mental conflict test. The data are _ tion of more impersonal, objective, and logical 


detailed in the following tabulation. type of mental life. 
Initial Scores Final Scores 
Standard Standard 
Mean Deviation Mean Deviation Critical 
Group Test of Mean Test of Mean Ratio 
92.3 7.8 61.0 4.6 3.45 
ae 70.4 6.0 74.8 5.3 
By analyzing the verbal responses obtained TABLE 1 
in the course of the conflict test, it is possible Per CENT CHANGE IN VERBAL RESPONSES 
to throw some light on the nature of the cen- Per Cent Ch 
tral processes. The classification of verbal soils Experi 
responses used in this investigation is a very Category Control mental 
simple one. It includes such categories as Coadjunction — 22.92 —8.62 
coadjunction, contrast, predicate relationship, Contrast -- —15.51 —25.95 
Causa: Gepem » Causal dependence 24.44 71.74 
indirect. This classification has been tested aay ee eer —32.83 15.35 
iti i iability. ord-complement 0.00 48.72 
e reliability tends to run in the neignbor- = Association by sound _... 28.57 37.50 
hood of 80 to 85 per cent.* An analysis of Indirect association _.__~- 45.15 —.56 
the verbal responses in this study indicates -------------- 
that both groups tend to change toward a Repetition 
more complex type of response. (Table 1) 
However, the direction of complexity varies As for the ratings of general adjustment, 


with the two groups. In the control group the in this area too the experimental group 
the of showed a significant change over the control 
of response. This is indica - 
- ; : : group. The critical ratio of the difference to 
its standard error is 4.6 as shown in the tabu- 
lation below. These ratings corroborate the 
response. In the experimental group, however, detailed findi f the attitude and 
there is a reduction in the egocentric, predi- 
cate, and indirect types and a very consider- Personality conflict tests. It is realized that 
able increase in the category causal depend- atings made by teachers who participated 7 
ence. Such an analysis tends to indicate that the study may have a subjective bias. They 
are included primarily to indicate what the 


the subjects making up the control group are : - 
finding it relatively more difficult to make — —_— of the general adjustment 0! 
the student. 


* Ibid. 
Initial Scores Final Scores 

Standard Standard 

Mean Deviation Mean Deviation Critical 

Group Test of Mean Test of Mean Ratio 

Is He Interested in His School Work? 

3.27 a 2.33 10 4.94 

3.12 .09 2.86 ll 1.94 
Is He Apparently Adjusted in the Classroom? 

ET 3.45 13 2.57 13 4.62 
3.03 10 2.80 10 171 


December, 1939] 


Many interesting illustrations could be 
given as to the changes teachers evidenced in 
their attitude toward and treatment of pupils 
after they had gained some insight into the 
personality of the experimental subjects. One 
teacher unwittingly gave away this bit of 
information about one of the subjects: 


“T welcomed the information concerning 
H. D. He always appeared to me to be 
well-mannered but very independent and 
resentful. So independent in fact that I 
hadn’t bothered very much with him. Nat- 
urally I was quite surprised to learn that 
in reality he was unhappy. At every oppor- 
tunity that presents itself I am now endeav- 
oring to assure him by my attitude that all 
of us have a personal interest in his wel- 
fare. I am trying to make him feel that he 
is definitely a part of the group.” 


Another teacher: 

“After your account of L. M. I see her 
as an unhappy child rather than an inso- 
lent one. I find it easier to accept her.” 


A sewing teacher: 

“I was very much interested in the in- 
formation concerning G. B. I had previ- 
ously caught myself wishing I knew more 
about her home life as she always appeared 
to be undernourished and inclined to be the 
‘mousy’ type. After learning that she re- 
ceived so little encouragement at home 1 
endeavor to praise her school work at 
every opportunity that arises and I notice 
she beams at every word.” 


An English teacher: 


“After discovering it was shyness and 
nervousness rather than sulkiness which 
prevented L. C. from reciting I made a spe- 
cial effort to see what could be done to help 
him overcome the difficulty. I seated him 
so he could be centrally located, praised 
him at every reasonable opportunity, en- 
couraged him not to do things alone but in 


PUPIL GROWTH 


147 


company with his classmates as asking him 
along with others to pass papers, and occa- 
sionally to read aloud.” 


These comments show the beginning of more 
complete understanding of pupil behavior. 
Shyness, resentment, over-aggression, and in- 
difference are known to be motivated often by 
conflicts and frustration. They are signals 
not for neglect or for the drawing of battle 
lines but for the need of mutual understanding 
and helpfulness. 

The data obtained in this study are con- 
sistent in showing that when teachers learn 
to know their pupils as personalities in their 
respective environments teachers tend to be- 
come more effective guides for learning—the 
pupils achieve more in academic areas—and 
teachers also become more effective person- 
ality “developers.” 

What do these findings signify for the 
school? They indicate that it is not enough 
to be concerned about “problem children” 
when it comes to personality development. 
Teachers to be effective guides for learning 
must know their pupils not as entities in the 
classroom but as living personalities with 
ambitions, attitudes, conflicts, and problems, 
coming from environments that vary greatly 
in the encouragement or discouragement 
effected. 

For teacher education data in this study 
indicate the importance of including training 
in understanding of child development and 
interpretation of child behavior. For admin- 
istration the results signify the importance 
of devising machinery by which teachers can 
learn to know their pupils in terms of their 
ambitions, hopes, and struggles in both in- 
school and out-of-school environments. The 
administrative difficulties are not insurmount- 
able and the expense is not great when com- 
pared with the many pounds of cure repre- 
sented by the many ounces of prevention that 
can be effected by the understanding teacher. 


SOME FAMILY LIFE PATTERNS AND THEIR RELATION 
TO PERSONALITY DEVELOPMENT IN CHILDREN* 


LELAND H. Stott 
University of Nebraska 


One of the first problems which must be 
dealt with in the study of family life and the 
home environment in relation to the person- 
ality adjustments of children is that of deter- 
mining the identity and the nature of the 
home-environmental factors themselves. Home 
life, like any other area of human activity, is 
extremely complex and assumes many forms 
and patterns. Information may be obtained 
regarding innumerable characteristics, prac- 
tices, relationships and activities which con- 
stitute family life. Certain of these items of 
information have obvious interrelationships 
and thus may logically be grouped together, 
but in general they are difficult to classify and 
to coordinate. 


One common procedure is to treat each 
item as a separate factor and to determine 
individually the relationship of each to the 
particular variable under consideration. The 
outcome is rather unsatisfying, however, be- 
cause, due to the varying degrees of intercor- 
relation among the items, reliable estimates of 
their relative significance, either individually 
or in combinations, are extremely difficult to 
obtain. They may not legitimately be grouped 
in any additive manner, yet the groupings 
and patternings of the items, rather than the 
individual items themselves, appear to consti- 
tute the factors of importance in family life. 
Some means of determining these fundamental 
groupings or patterns is essential to an ade- 
quate study of family life in relation to per- 
sonality development. The present study was, 
therefore, made in an effort to determine the 
most useful and significant manner of group- 
ing the items of a particular home life ques- 
tionnaire. It was hoped thereby more clearly 
to define some of the important patterns of 
family life. 

No. 248 Journal Series, Nebraska Agr. Exp. Station” This i 
one of a series of dealing with the results of a research 
ject, the general subject of which was the relation between 


me-environmental i ces rsonalit ustments in 
children. 


I. THE ANALYSES OF THE QUESTIONN ‘IR 
Data: DETERMINATION AND IDFNt- 
IFICATION OF PATTERNS 


The data with which this study was cop- 
cerned were the responses of 1,855 adolescents 
to the sixteen selected questionnaire items 
listed below. The subjects were all high 
school students from farm, small town and 
city homes of Nebraska. Since the three gen- 
eral home situations differed in certain re- 
spects one from another, it was decided to 
study each separately. The subjects were, 
therefore, treated as three separate popula- 
tions of 695, 640 and 520 from farm, sma! 
town and city respectively. 


Tetrachoric intercorrelations among the 
items were obtained by use of computing 
diagrams (1). The three correlation tables 
were then subjected to factor analysis by 
means of Thurstone’s centroid method (§) 
In each case, the number of zero factor |oad- 
ings in the factor matrix was maximized by 
use of the graphical method of rotating pairs 
of axes in one plane at a time (2). An attempt 
was then made to interpret the rotated factors 
as “patterns of family life” and as factors in 
the social environment of the home. 


It should be made clear, however, ‘hit 
these “patterns” are regarded as tentative and 
as subject to the narrow limitations of the 
particular set of items used. Some of them 
therefore, may not adequately be described, 
or accurately characterized in terms of th 
present items. Furthermore, in attempting ‘ 
interpret the factors, we were confronted wil 
the question of whether, in some cases, they 
did not represent something about the subje 
himself, rather than conditions in his extern! 
social environment. The original data wer 
reactions of boys and girls to questions 
garding the home situation, and not direct!) 
observed and objectively recorded facts abou! 
that home situation. Certain of the question 
were designed to get at some of the perso 
to-person relationships between parent a0 


148 


December, 1939] 


child. These items? asked the subject concern- 
ing his own behavior, or for his own personal 
reactions toward the behavior of his parents. 


It is, of course, impossible, in terms of the 
data at hand, to determine to what extent the 
personality of the subject, rather than the 
actual quality of his family life determined 
his answer to a given question such, for ex- 
ample, as whether or not he habitually con- 
fided in his mother. We do know that there 
is a certain degree of correlation between the 
the reactions to such questions and certain 
socially desirable personality traits. These 
correlations, however, tell us nothing regard- 
ing causality, ie., whether or to what extent 
a home situation in which certain “favorable” 
customs and certain relationships between 
parents and children have been established, 
is particularly favorable to the development 
of those desirable traits, or whether the pos- 
session of the traits by the youngster tends 
to lead to the establishment of the “favor- 
able” customs and relationships. 


In relation to the present question as to 
the nature of the factors which resulted from 
the analyses of the questionnaire data, how- 
ever, we are more directly concerned about 
the degree of correspondence between the 
subjects’ representations of their family situ- 
ations and the actual situations as they ex- 
isted. Even though the extent to which the 
personal attitudes and prejudices of the sub- 
jects colored their answers to the question- 
naire items is not known, nevertheless it 
seems safe to assume that the correspondence 
between report and actual situation was, on 
the average reasonably close. It is upon that 
assumption that the following interpretations 
are based. They are offered, however, with 
the realization that in some instances the 
factors may reflect to some extent, certain 
culturally determined ideals, attitudes or 
prejudices of the subjects themselves. 


The present results are of value chiefly, 
perhaps, in that they demonstrate the val- 
idity and usefulness of the factor method of 
determining and identifying the important 
factors of the home environment. Certain of 
the factors herein described are regarded as 
family-life patterns of considerable signifi- 
cance in relation to the personality develop- 
ment of children. 

; sam items particularly referred to here are items 7, 8, 9, 


FAMILY LIFE PATTERNS 


149 


Questionnaire Items Included in 
the Analysis 


1. Does your mother work outside the 
home? 

2. Do any paying roomers live in your 
home at the present time? 

3. Does your family have meals together 
at regular hours? 

4. Do you go with your family on visits, 
trips, picnics, to church, shows, enter- 
tainments, etc.? 

5. Does your family have enjoyable times 
together in your home playing games, 
reading aloud, telling stories, or sing- 
ing and playing instruments? 

6. Do your parents like to have you bring 
your friends or have them come to 
your home? 

7. Do you tell your mother your joys and 
troubles? 

8. Do you tell your father your joys and 
troubles? 

9. Do you often kiss your mother? 

10. Has your mother been sick in bed dur- 
ing the past year? 

11. Has your father been sick in bed dur- 
ing the past year? 

12. Is your mother nervous? 

13. Is your father nervous? 

14. Does your mother do things which you 
do not like? 

15. Does your father do things which you 
do not like? 

16. Were you scolded or punished at home 
last week? 


Farm Family Life Patterns —The intercor- 
relations of the items as answered by the 
farm group are presented in Table 1. The 
coefficients ranged in size between —.17 and 
+.88. The analysis of the correlation matrix 
was carried through five factors. At that point 
only three of the 210° residuals were greater 
than zero in the first decimal place, and with 
one exception, they were all less than two 
times the standard error of an original cor- 
relation of zero.’ The five factors, therefore, 
account for practically all of the variance 
represented by the item intercorrelations. 


2Item no. 2, regarding roomers in the home, was so rarely 
answered in the affirmative by the farm subjects that correla- 
tions involving it could not be ww -¥ The analysis for that 
group, therefore, included only the 210 intercorrelations of 
the remaining 15 items. 

* The standard error of the tetrachoric correlation of zero 
— KX, 4 cases and with the dichotomies divided at the 


150 


A satisfactory transformation of the factor 
matrix was obtained after seven rotations of 
different pairs of axes. The factors after rota- 
tion are presented in Table 2. 

In interpreting these factors, the factor 
loadings which were less than three times the 
standard error of an original correlation of 
zero were regarded as insignificant. Factor 
I,, accordingly, has only three significant 
loadings, and none of these is particularly 
large. This factor may, therefore, be consid- 
ered as of relatively little importance, so far 
as this particular group of items is concerned. 
Interpreting it as a pattern of family life, the 
family following this pattern might be de- 
scribed in terms of the significantly loaded 
items, as follows: (1) frequent good times 
are enjoyed in the home by the family group, 


*Since no method is yet available for determining the stand- 
ard egror of a factor loading, it is, of course, not known how 
large a factor loading in any analysis must be in order to be 
significant. However, for the want of a better criterion of sig- 
nificance the one here used is becoming customary. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


(2) parents like to have their children bring 
their friends, or have them come into the 
home, and (3) the children are frequently 
scolded or punished. 

This last item seems, at first thought, to 
be quite inconsistent with the other two which 
make up the pattern. However, since scolding 
and other mild forms of punishment are s 
common even in families characterized gep- 
erally by a congenial and happy atmosphere, 
there is probably no real inconsistency in the 
pattern. For present purposes it may be desig- 
nated the “congeniality” factor. 

Factor II, is the most clear cut of the five, 
Seven of the 15 items have significant load- 
ings in this factor, while the remaining eight 
have loadings of zero or near zero. The fam- 
ily characterized by this particular pattern 
may be described in terms of the heavily 
loaded items as follows: (1) conditions and 
relationships are such that the children habit- 


TABLE 1 


INTERCORRELATIONS AMONG THE QUESTIONNAIRE ITEMS AS ANSWERED BY THE FARM SUBJECTS 


Item 1 3 4 5 6 7 


09 

.30 

.00 .16 .40 

01 .20 .2 .8 

7..... .@ .© .2 .3 
8..... —.17 06 .25 .39 .22 
28 .26 .29 .47 
10__.. .18 03 .05 —.04 —.01 
= -11 .20 08 —.06 —.15 04 
.03 .14 —.02 —.05- .00 08 
.06 —.03 —. 02 05 08 
.08 .10—.06 .07 12 
.01 .08 —.04 05 =.00 01 
16..... .07 .10—.08 .16 16 10 


8 9 10 ll 12 13 1415 


. 22 
.00 —. 14 
—.03 .00 .40 
.08 —.17 12 
.15 .17 22. 57 
.03 .08 16 .15 08 
.02 —. 03 13 .12 = .12 12 8 
03 13 16 .12 .09 02 33 36 


TABLE 2 


Factor LOADINGS AND COMMUNALITIES AFTER ROTATION OF Axes, DATA FROM 
THE FARM SUBJECTS 


Item Ir Ilr 
—.02 . 03 
. 02 36 
. 50 .45 
. 40 
—.09 . 89 
.10 . 63 
.10 51 

—.07 
—.14 03 
.05 —.04 
. 03 . 02 
—. 08 . 06 
—.01 —.02 
. 32 . 05 


Ve h2 
—.06 —.12 . 34 . 182 
. 08 . 09 . 34 . 257 
—.08 —. 08 .24 376 
—.04 —.02 .02 . 455 
—.06 . 06 —. 06 . 368 
.20 . 02 —.13 . 860 
.31 03 —. 33 614 
—.21 .12 . 08 . 341 
. 42 .12 .43 . 391 
. 25 .15 . 52 379 
. 15 . 09 . 05 574 
. 65 . 08 .04 . 428 
oan .95 —. 05 916 
a . 85 —.09 . 822 
05 .41 .12 . 283 


December, 1939] 


ually confide in both parents (tells them their 
joys and troubles), (2) the children fre- 
quently accompany the family group on trips, 
picnics, and other recreational excursions out- 
side the home, (3) the children frequently 
kiss their mother, (4) the family group often 
has enjoyable times together in the home, 
(s) the parents like to have the children’s 
friends come into the home, and (6) meals 
are usually eaten together as a family group 
at regular hours. The picture here is one of 
confidence, affection and companionability. 


It will be noticed that two of the three 
significant items of Factor I; are also sig- 
nificantly loaded with Factor II;. This sug- 
gests, of course, that the two factors are, to 
some extent, correlated. With the method of 
axis rotation employed, the original orthog- 
onality of the centroid axes was maintained 
and hence, the factor loadings, as shown in 
the table, are expressed in terms of that or- 
thoganal system. In the instance of Factors I, 
and II, however, oblique axes would better 
have fit the distribution of points. By rotat- 
ing independently the two axes so as to effect 
this better fit their angular separation would 
have been reduced from go° to 54° and the 
correlation between them thus increased from 
zero to +.59.° The two factors, then, al- 
though far from being identical may be 
regarded as somewhat related to each other. 

With Factor III, the picture is quite dif- 
ferent. Items 12 and 13, having to do with 
nervousness in father and in mother, are the 
most significant in this pattern. In addition, 
illness of mother, illness of father, a confiden- 
tial relationship between father and child and 
the tendency not to kiss mother enter the 
pattern. Perhaps “nervous tension” might be 
used as a short designation of this factor. 


Another pattern, entirely independent of 
the last, but which also suggests an unfavor- 
able family situation, is seen in Factor IV;. 
Like Factor I; only three items have loadings 
in it of any significance. Two of these, items 
14 and 15, have very large loadings. They 
involve the admission or the accusation from 
the subject, that the mother and the father 
do things which he does not like. The third 
item is the admission of having been punished 
during the week. As to what extent this factor 
reflects a temporary critical attitude or a.re- 
sentment in the subject himself due to recent 


*The cosine of the angle separating two axes expresses the 
of correlation between them. A discussion of this use 
of the cosine is given by Guilford (2, 474). 


FAMILY LIFE PATTERNS 


151 


punishment, or to what extent it represents a 
permanent family situation characterized by 
parental misconduct and family disharmony, 
of course, cannot definitely be determined 
from the present data. In a later section of 
this paper, however, some evidence is pre- 
sented which seems to support the view that 
the factor represents a characteristic family 
situation. 

Factor V; presents some difficulty in inter- 
pretation. None of the significant factor load- 
ings are more than moderately large and most 
of them are rather small. The pattern is as 
follows: (1) father was ill in bed sometime 
during the year, (2) mother was ill in bed 
sometime during the year, (3) the family has 
regular meals together, (4) the mother works 
outside the home,® (5) confidential relation- 
ships do not exist between father and children 
and (6) the young people accompany the 
family group on recreational excursions out- 
side the home. 


The picture here is somewhat mixed. The 
illness of the parents, the mother participat- 
ing in the outdoor work and the tendency in 
the children not to confide in father present 
the negative side. The customs of regular 
family meals and of going on recreational ex- 
cursions as a family group suggest family 
integration. The mother’s outside work might 
even be thought of as supporting the picture 
of integration of purpose and cooperation. 
This pattern perhaps represents the rather 
typical, hard-pressed farm family situation. 


The analysis of the intercorrelations among 
fifteen home life items as reacted to by the 
farm adolescents, then, revealed five patterns 
or factors of family life. These were (1) a 
pattern characterized by congeniality, (2) one 
somewhat related to the first, but perhaps 
more important, which was characterized by 
confidence, affection and companionability, 
(3) a pattern which would be regarded as 
“unfavorable” from the standpoint of child 
development, characterized by nervous ten- 
sion, illness and the lack of demonstration of 
affection, (4) another “unfavorable” pattern 
whose interpretation is perhaps even less sure 
than the others but which probably represents 
a situation characterized by the lack of wis- 
dom and by misconduct on the part of par- 
ents, and (5) a pattern, perhaps not uncom- 
mon among the hard pressed farm population, 


¢ For farm families this usually means that the mother hel 
in the field work on t 
arm. 


152 


in which integration of purpose and coopera- 
tion is suggested but which also includes over- 
work, particularly on the part of the mother, 
some tendency for the parents to be ill and 
the lack of confidential relationships between 
the father and the children. 


Small Town Family Life Patterns.—The 
table of item intercorrelations (Table 3) for 
the small town group, in general, resembles 
quite closely that for the farm group. Some 
of the correlations involving item 1, which 
asked whether the mother worked outside the 
home, however, were somewhat different in 
magnitude. For example, the correlation be- 
tween mother working and regular family 
meals in the case of the town group was -++-.31 
as compared with -++.09 for the farm group. 


TABLE 3 


INTERCORRELATIONS AMONG THE QUESTIONNAIRE ITEMS AS ANSWERED BY 
THE SMALL TOWN SUBJECTS 


Item 1 2 3 4 5 6 7 8 9 10 11 12 18 «14 «+235 
.18 
. 31—. 09 
—.03—.02 .27 
.18—.12 .18 .58 
.12—.05 .13 .30 .38 
C—O .08—.18 .07 .20 .21 .17 .78 
Sse .00 .00 .07 .26 .37 .28 .40 .38 
eee —.06 .15 .10—.02—.10—.16 .11 .03—.08 
.18 .23 .10—.09—.09—.05 .05 .138—.02 .46 
.O7—.02 .11 .10 .14 .06 .07 .17—.05 .10 .22 .40 
.00 .06 .03. .15—.02—.02 .05 .13 .22 .05 .16 .05—.08 
— .02 .05 .02 .06—.05—.03 .18 .21 .22 .10 .27 .08—.05 .87 
.10 .00—.00 .06 .05 .10 .00 .08—.02 .10 .10 .13 .00 .22 .25 
TABLE 4 


Factor LOADINGS AND COMMUNALITIES AFTER ROTATION OF AXES, DATA FROM 
THE SMALL TOWN SUBJECTS 


Factors 
Item It IIe IVt Vt Vit h2 
03 —.01 04 16 . 43 . 32 310 
Pas . 03 . 09 13 —.20 —.17 | . 229 
—. 03 .18 12 26 . 35 . 276 
Sa 03 . 56 02 43 . 03 —.00 . 502 
—.07 . 64 05 43 . 09 —.03 . 610 
| Ee —.02 . 37 03 38 .07 —. 06 . 294 
ol —.10 10 87 —.31 10 879 
ae 17 —.24 —.14 78 —.12 —.07 733 
eae 16 .29 —.16 45 —.23 .07 392 
07 —.12 34 —.03 —.19 .42 348 
ot 22 —.15 27 —.01 —.03 . 63 535 
03 .09 69 12 —. 00 519 
ae —.08 —.02 50 15 .09 . 05 294 
89 .09 —.04 04 —.07 . 03 813 
94 ~.04 01 08 —. 06 07 896 
30 ll 16 —.02 12 02 138 


JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 8, No. 2 


Town mothers who work must have their 
meals at regular hours. On the farm, regular 
meals are more frequently the rule regardless 
of whether or not the mother helps with the 
outside work. As has already been suggested 
for a town mother to work outside the home 
usually means something quite different thay 
for a farm mother to work outside the home. 
Many of the town mothers who were re. 
ported as working had jobs that often lent 
some degree of prestige to the family, such 
for example, as teaching school, clerking in a 
store or office work. Correlations involving 
this item might, therefore, be expected to 
differ between the farm group and the town 
and city groups. Correlations with item 2, 
roomers in the home, were also included jp 
this analysis. 


he 


December, 1939] 


it was necessary, in this instance to carry 
the analysis through six factors. These factors 
as they appeared after nine rotations of axes 
are presented in Table 4. 


As might be expected, there is considerable 
resemblance between the sets of factors from 
the town and the farm groups. At the same 
time. some small differences are worthy of 
note. Factor I, clearly represents the some- 
what ambiguous pattern which was called 
“parental misconduct” or “family dishar- 
mony’ in the previous analysis. The heavily 
loaded items are, in order, (1) father does 
things not liked, (2) mother does things not 
liked, (3) child was punished during the past 
week. and (4) the father had been ill. The 
last item, which has a very low (barely sig- 
nificant) loading, does not appear in the cor- 
responding farm pattern. Otherwise the two 
patterns correspond almost exactly (factor 
loadings of nearly the same magnitude). 


Factor II, does not so closely identify itself 
with any particular pattern of the previous 
analysis. It does, however, give a clearer pic- 
ture of “family congeniality” than the farm- 
home factor thus designated (Factor I,). It 
includes two of the three significantly loaded 
items of Factor I;, and in addition three oth- 
ers, two of which definitely contribute to the 
picture of congeniality. The significant items 
in order are as follows: (1) enjoyable times 
together in the home, (2) children join the 
family group in out-of-home recreational ex- 
cursions, (3) parents like to have the children 
bring their friends into the home, (4) children 
frequently kiss their mother, and (5) children 
do not often tell their joys and troubles to 
their father. This last item is the only dis- 
cordant note. Its loading, however, is quite 
‘ow (—.24) and hence should not be given 
much weight. 


Factor III, is made up of only four items 
and only two of those have loadings of even 
moderate magnitude. The pattern, however, 
is fairly clear cut and is quite similar to the 
'arm-home factor which was called “nervous 
tension”. Two of the least important items in 
the farm pattern do not appear here. The four 
‘tems in order of importance are (1) mother 
's nervous, (2) father is nervous, (3) mother 
was sick in bed during the year, and 
(4) father was sick in bed during the year. 
It is clearly a parental health and adjustment 


factor 


FAMILY LIFE PATTERNS 


153 


Factor IV, corresponds almost exactly with 
Factor II;. As was the case in the previous 
analysis, this pattern is the most clear cut of 
all and is perhaps the most important one. 
The significant items in order are (1) children 
tell their mother their joys and troubles, 
(2) children tell their father their joys and 
troubles, (3) children frequently kiss their 
mother, (4) good times are enjoyed as a 
family group in the home, (5) children join 
the family group in recreational excursions 
outside the home, (6) parents like to have 
children bring their friends into the home, and 
(7) meals are eaten together at regular hours. 
The pattern clearly suggests mutual ‘“confi- 
dence, affection and companionability.” 


Only four items of moderate to low load- 
ings appear in Factor V,. The items are, 
(1) mother works outside the home, (2) fam- 
ily has regular meals together, (3) children 
do not tell their mother their joys and trou- 
bles, and (4) they do not often kiss their 
mother. This pattern is probably character- 
istic of the working-mother family. Meals, of 
necessity, are taken regularly because of the 
mother’s work. Also because of her work, she 
is not so constantly a part of the home life 
of the children and hence the confidential and 
affectional relationships between her and the 
children are not so frequently and readily 
formed. 


Factor VI, lines up fairly closely with 
Factor V;. The pattern again suggests the 
struggling, hard-hit sort of family situation, 
in which the health of the parents is not espe- 
cially good, and in which the support of the 
family depends upon the mother’s outside 
work and room rent from paying roomers. 
The items are (1) father sick in bed during 
the year, (2) mother sick in bed during the 
year, (3) roomers in the home, (4) mother 
works outside the home, and (5) regular 
family meals. . 


The analysis of the home-life questionnaire 
data from small town adolescents, then, 
yielded “patterns” which for the most part, 
corresponded with those obtained from the 
analysis of farm-home life data. In one case 
(the factor which suggested “confidence, 
affection and companionability”) the corre- 
sponding patterns were practically identical. 
Each of four other patterns was readily recog- 
nized as having much in common with one of 
the remaining four of the farm-family life 
patterns. An additional town-family life pat- 


154 


tern was suggested. This pattern appeared to 
fit the working-mother situation in which 
family meals tend to be regular, but where 
little tendency for the children to confide in 
the mother or to express affection for her are 
shown. 

City Family Life Patterns —With a few 
rather interesting exceptions, the table of item 
intercorrelations (Table 5) for the city group 
is quite similar to those for the other groups. 
The item combinations giving the highest cor- 
relations in most cases coincide in the three 
tables. For the city families, however, the 
custom of regular family meals, was found to 
correlate somewhat more closely with certain 
other favorable items, such as the custom of 
going on family excursions or confiding in 
father. These correlations for the city group 
were +.43 and -+-.33 as compared with 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No, > 


+.27 and +-.07 for the town group, and 
-+.30 and -++-.06 for the farm group. Among 
the “unfavorable” items, nervousness jp 
mother, in the case of the city group, tended 
to correlate more closely with the child’s ad. 
mission that he disliked something in his 
parents’ behavior (+.31 and +-.36 as com. 
pared with +.06 and +.08 for the town 
group and +.21 and +.12 for the farm 
group), and less closely with illness of parents 
(++.18 and .oo as compared with +-.32 and 
+.22, and +.34 and +.12). 

These and other small differences in degree 
of item incorrelation between the urban and 
the two rural groups are, of course, reflected 
in the “patterns” as revealed by the factor 
analysis. 

Six factors were obtained in this analysis. 
Ten separate axis rotations were required in 


TABLE 5 
INTERCORRELATIONS AMONG QUESTIONNAIRE ITEMS AS ANSWERED BY THE CITY SUBJECTS 


Item 1 2 3 4 5 6 7 8 9 10 11 12 #138 14 285 
.13 
-10 .18 
.18—.07 .43 
-08 .02 .28 .48 
.0O7 .12 .09 .32 .40 
.05—.038 .10 .40 .30 .31 
.12—.04 .33 .43 .31 .07 .75 
—.02—.05—.02 .18 ,13 .16 .38 .33 
.08 .01—.05 .08 .02—.02 .10 .10—.06 
ree —.02—.29 .00—.07—.03—.15 .03—.12 .03 .29 
C—O .12—.13 .11 .14 .12 .02—.02 .10 .12 .18 .00 
eee *.09 .23 .18 .07 .09 .13—.03 .15 .03 .14 .22 .46 
eee .19—.02 .12-.17 .16 .06 .06 .15 .09 .16 .10 .31 .08 


Oe .18 .02 .19 .05 .10 .14 .04 .16 .14 .08 .13 .36 .20 .8 
.09 .08—.02 .16 .03 .25 .08 .30 .33 


TABLE 6 


Factor LOADINGS AND COMMUNALITIES AFTER ROTATION OF AXES, DATA FROM 
THE City SUBJECTS 


Factors 
Item Te Ile IIIc c Ve Vic h? 
. 00 .07 .16 —.02 . 08 25 099 
eS ae —.48 .14 —.00 —. 06 . 06 .40 420 
—. 23 .09 .19 42 .20 —.04 315 
. 00 —.04 . 00 . 60 .48 —.11 603 
—.01 —.07 . 03 . 61 .43 . 08 460 
eee —.07 —.07 —.05 . 36 . 35 . 25 . 328 
_ Ee .17 . 08 —.06 —.08 . 89 —. 03 832 
.29 . 08 —. 02 . 80 —.20 761 
. 05 -11 .01 . 37 —. 26 228 
ae . 25 —.08 . 50 —.12 . 06 —.05 338 
. 52 .12 .10 —.01 —.08 —.10 307 
as 10 42 . 39 . 23 —.03 —.03 392 
—.07 .74 .20 .24 —.04 . 08 655 
15 —.14 . 84 .12 .15 790 
01 —. 02 91 11 08 10 858 
14 . 02 32 —.02 . 09 24 184 


December, 1939] 


order to obtain the maximum number of zero 
loadings. The factors after rotation are shown 
in Table 6. 

For the most part, the constellations of 
items are similar but not identical with those 
found in the previous analyses. Factor I., 
however, has little resemblance to any previ- 
ously described. The items of the pattern, in 
the order of their importance, are as follows: 
(1) the father was ill in bed during the year, 
(2) no paying roomers in the home, (3) the 
mother was ill in bed during the year and 
(4) the family does not have meals together 
at regular hours. The family situation here 
suggested presumably is not one which would 
contribute favorably to the personal develop- 
ment of the children. Perhaps to describe the 
opposite aspect of the pattern (the reflection 
of factor vector) might give a more positive 
and meaningful picture. The family life pat- 
tern would, then, be one in which the father’s 
health is good, paying roomers are present in 
the home, the mother’s health is good and the 
family members eat together at regular hours. 
This pattern probably represents a very com- 
mon situation in urban American family life. 


Factor II, involves only three items, the 
factor loading in only one of which is par- 
ticularly large. However, since the remaining 
13 items have loadings of zero or near zero, 
its meaning is quite definite. Although fewer 
items are involved, the factor corresponds 
fairly closely with Factor III; and Factor III, 
which were designated “nervous tension”’. 
The three items with significant factor load- 
ings are: (1) father nervous, (2) mother 
nervous, and (3) children tell father their 
joys and troubles. The third item, which also 
appears in the corresponding constellation in 
the first analysis (Factor III;), had a factor 
loading of only .29 and is therefore of rela- 
tively little importance. 

Six items are involved in Factor III., two 
of which are quite large. The family charac- 
terized by this pattern would be one in which 
(1) the father does things for which the chil- 


_ dren are willing to criticize him, (2) the 


mother does things for which the children 


_ criticize her, (3) the mother was sick in bed 


during the year, (4) the mother is nervous, 


_ (5) the children are frequently punished and 


(6) the father is nervous. It will be recalled 


_ that the first two items and item 5 of the list 


are the three items constituting Factor IV, 
and three of the four items of Factor I.. 


FAMILY LIFE PATTERNS 


These factors were regarded as somewhat 
ambiguous but were provisionally labeled 
“parental misconduct” or “disharmony”. 
These labels, perhaps, fit the present pattern 
equally well even with the addition of the two 
or three extra items. 


The fourth pattern (Factor IV.) also in- 
volves six items. Most of the loadings, how- 
ever, are small and none of them are of more 
than moderate magnitude. The picture is 
clearly one of a “favorable” family situation. 
The most important items are: (1) children 
accompany the family group on picnics, visits, 
etc., (2) the family group has frequent enjoy- 
able times in the home, (3) the family group 
has meals together at regular hours, (4) the 
parents like to have the children bring their 
friends into the home, (5) the father is 
nervous and (6) the mother is nervous. This 
pattern, although not the same as Factor I; 
and Factor II,, is nevertheless one that might 
be designated by the term “family congenial- 
ity”. The last two items, nervousness in 
father and nervousness in mother, are of 
course, not essential to the “congeniality” pic- 
ture but, at the same time, they are not espe- 
cially inconsistent with the rest of the picture. 
Most of the parental “nervousness” in such a 
family situation might be merely concern 
about the welfare and safety of the young 
people. Perhaps “family unity” might more 
precisely describe this particular family life 
pattern. 


As was the case in the two previous anal- 
yses, one of the six factors (Factor V.) 
stands out as most clearly defined of all. 
Furthermore, it is obviously identical with 
the corresponding ones in the other analyses 
(Factors II; and IV,). Precisely the same 
items were involved and, with a few minor 
inversions, the items fall in the same order 
as to importance. These items are (1) chil- 
dren habitually confide in their mother, 
(2) children habitually confide in their 
father, (3) children join the family group on 
picnics, visits, etc., (4) the family group has 
frequent enjoyable times together in the home, 
(5) the children frequently kiss their mother, 
(6) the parents like to have the children 
bring their friends into the home, and (7) the 
family group has regular meals together. 
This pattern of “confidence, affection and 
companionability”, then, appears in almost 
exactly the same form in three independent 
analysis in which wholly separate populations 


156 


were involved. The conclusion, therefore, that 
this factor represents one “favorable” com- 
mon pattern in American family life seems 
justifiable. 

The sixth factor from the analysis of the 
city home life data is of little importance since 
only one item, roomers in the home, has a 
loading of even moderate size. The other four 
factor loading are all very small. The rather 
indistinct picture is one in which (1) roomers 
are present in the home, (2) the children 
seldom kiss their mother, (3) the mother 
works outside the home, (4) the parents wel- 
come their children’s friends into the home 
and (5) the children are frequently punished. 

The similarity between the configurations 
resulting from the analysis of the city family- 
life data and those resulting from the two 
previous analyses is worthy of special note. 
Although there were differences in the cor- 
responding total environmental, or family be- 
havior patterns among the three general home 
settings, they were in most cases so similar 
that the same descriptive designations fit all 
equally well. This fact is considered of some 
significance in that it furnishes evidence of 
the usefulness and validity of factor analysis 
as a method of grouping, coordinating and 
validating the many items of family life and 
the home environment. The same character- 
istic patterns tended to appear repeatedly 
when entirely different populations were used, 
even though each population represented 
quite a different general home setting and a 
somewhat different cultural background. A 
means eventually of determining the impor- 
tant, basic factors of the home environment 
is demonstrated. It is felt, furthermore, that 
a few of these important factors have been 
rather cleafly identified in the present study. 
These were (1) a pattern of intrafamilial re- 
lationships which is characterized by “confi- 
dence, affection and companionability”, (2) a 
somewhat related, but less inclusive family 
behavior pattern which is characterized by 
“congeniality”, (3) a factor involving family 
discord, or perhaps parental misconduct, 
which predisposes the adolescent child to 
offer criticism of his parents’ behavior and 
which apparently includes frequent punish- 
ment of the child by the parents in home, and 
(4) a “nervous tension” factor which includes 
“nervousness”, or at least something which 
the adolescent child interprets as nervousness, 
on the part of the parents, and usually other 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


items as well, such as illness of parents anq 
infrequent demonstrations of affection 
mother. The other patterns were apparently 
either limited to one cultural group, or wer 
common to only two of the three. 


II. THE RELATION OF 
PATTERN TO THE PERSONALITY 
ADJUSTMENTS OF CHILDREN 


Since certain of the “family-life patterns” 
as revealed by the three separate factor anal- 
yses of the questionnaire data, coincided s 
closely as to suggest that they were perhaps 
universal factors in the various cultural set. 
tings, the next task was to attempt to deter- 
mine their significance in relation to the per. 
sonal development of children. Two patterns 
were, therefore, selected for study, viz., the 
“confidence, affection and companionability” 
pattern and the pattern which was designated 
as “family discord” or “parental misconduct”, 
The former is presumably a favorable factor 
while the latter would appear to be unfavor. 
able from the standpoint of the childs 
adjustments. 

From each of the three cultural settings, 
two groups of subjects for each of the two 
factors were selected in terms of their answers 
to the heavily loaded items. The high cr- 
terion groups in each case were made up of 
those subjects who characterized their family 
situations as fitting the pattern. The low cri- 
terion groups likewise were made up of those 
who characterized their family situations a 
the direct opposite to the pattern. These con- 
trasting groups were then compared in terms 
of mean scores on tests of certain aspects of 
personality adjustment. 

Significance of the Pattern of “Confidence, 
Affection and Companionability”.—In Table 
7 are given the comparisons relative to the 
confidence, affection and companionability 
pattern. Of the 695 farm subjects, 167 int: 
cated by their answers that their family sitv- 
ations conformed to this pattern while 17: 
indicated that their family situations wert 
directly opposite to the pattern.’ Of the 64 
small town subjects, 128 placed their families 
in the “conforming” group, while 181 placed 
their families in the opposite group. For the 
§20 city subjects, the corresponding stops 
were 122 and 148. 


* The criterion adopted for the selection of these 
was that five of the seven significant items of the pati 
including the two most significant ones, must have been 4 
swered positively for the “conforming” group and negative! 
for the opposite group. 


December, 1939] 


FAMILY LIFE PATTERNS 157 


TABLE 7 


A COMPARISON OF THE MEAN TesT Scores OF SUBJECTS WHO CHARACTERIZED THEIR FAMILY 
LIFE AS CONFORMING TO THE PATTERN OF “CONFIDENCE, AFFECTION AND COMPANIONABILITY” 
WitH THE MEAN SCORES OF THOSE WHO CHARACTERIZE THEIR FAMILY LIFE AS Not Con- 


FORMING TO THE PATTERN 


Family Life Family Life 
Conformed to Did Not Conform 
Personality Pattern to Pattern Diff. Diff. 
Variable 
Mean Mean o diff. 
N* score o N* score o 
Farm Subjects 
Personal adjustment (Maller). 165 37.7 6.4 170 34.4 2% 3.3 4.24 
Independence. _.....-..----- 165 38.9 9.1 164 37.0 9.5 1.9 1, 84 
Appreciation of home life.___. 165 86.0 4.7 172 78.2 12.8 7.8 7.41 
Average personality score**.. 167 61.0 4.6 172 57.9 5.3 3.1 5. 66 
Small Town Subjects 
Personal adjustment (Maller). 124 37.8 72 181 32.6 7.8 5.2 5.99 
Independence_.-....-..----- 128 39.6 8.9 176 34.6 10.7 5.0 4.40 
Appreciation of home life.._.. 126 86.1 5.8 176 66.3 17.6 19.8 13.95 
Average personality score** _. 128 61.8 4:7 180 , 57.6 5.3 4.2 7.31 
City Subjects 
Personal adjustment (Maller). 117 41.1 5.7 145 36.3 7.3 4.8 6.04 
Independence. _...........-- 116 43.1 8.6 146 38.7 10.1 4.4 3. 86 
Appreciation of home life__.. 122 87.1 2.4 155 76.5 15.7 10.6 7.34 
Average personality score** _. 119 62.0 3.9 148 57.4 5.6 4.6 7.88 


* The slight variation in the number of subjects in the comparison grou 


for a given cultural setting 


is due to the fact some subjects, for one reason or another, missed some of the tests or failed to complete 


them satisfactorily. 


**“ Average personality score” was for each subject the average of his standard scores (modified) 
on all personality scales which he had answered completely. The battery furnished scores on nine different 


variables. 


Scores on nine different traits of personality 
and aspects of personal adjustment had been 
obtained for the subjects of this study. 
Descriptions of the tests and fuller reports of 
the results are given elsewhere (6 and 7). 
Three of these variables were selected for 


| study in relation to the particular family-life 


pattern now under consideration. They were 
“personal adjustment” as measured by Mal- 
ler’s inventory (3), independence of decision 
in personal matters—a variety of self- 
reliance (5), and appreciation of (attitude 
toward) home life. 

As might be expected the differences be- 
tween contrasting groups in appreciation of 
home life were greatest. These mean differ- 
ences ranged from 7.8 attitude scale points for 
the farm groups, to 19.8 points for the small 
town groups. 

The differences in mean personal adjust- 
ment scores were next in size. They ranged 
from 3.3 Scale points for the farm groups to 
5.2 points for the small town groups. These 
differences are all highly reliable (4 to 6 times 
their standard errors), and are large enough 


to indicate a definite relationship between 
the “confidence, affection, and companion- 
ability” pattern of family life as reported by 
adolescent subjects, and the adequacy of their 
own personal adjustment. 

“Independence of decision” also bore a re- 
lationship to this family life pattern, particu- 
larly with the small town and the city sub- 
jects. The difference between the two farm 
groups in mean “independence” score of 1.9 
scale points, although in the expected direc- 
tion, was not large enough to be significant. 

In order to obtain a sort of general person- 
ality rating for each subject, modified sigma 
scores on each scale were made up and aver- 
aged. The contrasting groups for each home 
setting were compared in terms of the means 
of these general ratings. The differences were 
all highly reliable, the critical ratios ranging 
between 5.66 and 7.88. These differences sug- 
gest that the children of families characterized 
by this “favorable” pattern of family life 
were superior to those of families of the oppo- 
site sort, not only in personal adjustment, in- 
dependence and appreciatory attitude toward 


2 
or 
ly 
re 

‘ 
IR 
$0 
t- 
T- 
ns 
d 
r 
[- 

0 
i- _ 
if 
i- 
s 
if 
ie 
i- 
0 


158 


home life, but also in their general personality 
development. This particular pattern may, 
then, be regarded as an important, as well as 
a “universal” factor of the home environment. 

Significance of the “Family Discord” Pat- 
tern.—The “family discord” pattern, it will 
be recalled, was not so clearly identical in the 
three cultural settings as was the pattern dealt 
with above. There was some variation as to 
the particular items included, as well as to 
their relative importance in the pattern. Con- 
trasting groups from each home setting were 
nevertheless selected and were compared in 
regard to mean personal adjustment score and 
mean score in appreciation of home life. The 
results of these comparisons appear in 
Table 8. 

The differences, again, are all in the ex- 
pected direction and are all highly reliable 
statistically. The children of families which, 
according to the reports of the children were 
characterized by “family discord”, the essen- 
tial element of which apparently was the lack 
of judgment and perhaps misconduct on the 
part of parents, were definitely less well ad- 
justed to life and were definitely less appre- 
ciative of home life than were children of 
families which were not thus characterized. 

The question was raised above (page 7) 
as to the probability that this factor repre- 
sented a temporary critical attitude or a re- 
sentment on the part of the subject due to 
recent punishment rather than a permanent 
family situation characterized by parental 
misconduct and family disharmony. The fact 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


that a definite relationship appeared between 
this factor and scores on the personal adjust. 
ment scale suggests the factor represents 
something rather stable in the situation, since 
the adjustment score is highly reliable (,) 
and presumably therefore would not vary 
widely with temporary critical attitudes 
toward parents. 

The Independence of the Two Patterns — 
The two “patterns of family life” here cop. 
sidered were, in each of the three factor anal. 
yses, represented by two axes of an orthogonal 
system, and were, therefore, theoretically yn. 
correlated with each other. It seemed desir. 
able nevertheless to determine to what extent 
those subjects who were in the non-fitting 
group of the “favorable” pattern were also in 
the group which conformed to the “unfavor- 
able” pattern and vice versa. The tetrachoric 
correlation between conforming to the favor- 
able pattern and not conforming. to the u- 
favorable pattern was computed giving an r 
of —.13. This, of course, means that the two 
home-environmental factors varied in our 
sample almost entirely independently of each 
other. 

The Combined Relationship of the Two 
Factors to Personality Adjustment.—Eighty 
of the farm subjects, 44 of the small town 
subjects and 53 of the city subjects, however, 
indicated by their answers that their family 
situations not only conformed to the favorable 
pattern but also were directly opposite to the 
unfavorable pattern. These are hereafter re- 
ferred to as the “favorable home environ- 


TABLE 8 
A COMPARISON OF THE MEAN SCORES IN “ADJUSTMENT” AND HOME LIFE APPRECIATION OF 


ADOLESCENTS WHO CHARACTERIZED THEIR FAMILY LIFE AS FITTING THE PATTERN OF 
“FAMILY -DiscorD” WITH THE MEAN ScoRES OF THOSE WHO CHARACTERIZED THEIR FAMILY 


Lire AS Not FITTING THE PATTERN 


Family Life Family Life 
' Conformed to Did Not Conform 
Personality Pattern to Pattern Diff. _—Diff. 
Variable 
Mean Mean o dif. 
N* score N* score ¢ 
Farm Subjects 
Personal adjustment.________ 82 33.8 7.3 297 37.6 6.4 —3.8 —4.32 
Appreciation of home life_ ____ 82 77.8 15.2 297 84.6 6.1 —6.8 —3.9 
Small Town Subjects 
Personal adjustment._______. 216 33.7 7.1 163 36.4 7.4 —2.7 -82 
Appreciation of home life_____ 212 79.3 14.7 163 83.9 9.0 —4.6 —3.72 
City Subjects 
Personal adjustment. _______ 99 35.0 7.4 148 37.9 6.0 —2.9 —3.26 
Appreciation of home life____ _ 80 75.6 17.8 160 85.1 8.7 —9.5 —451 


|_| 
| 
( 
P 
I 
P 
if 
A 
P 
Ir 
A 
tic 
wl 
m 
45 
re 
tic 
an 
to 


December, 1939] FAMILY LIFE PATTERNS 159 


TABLE 9 


CoMPARISON OF THE MEAN PERSONALITY TEST SCORES OF GROUPS WHO CHARACTERIZED THEIR 
FAMILY LIFE AS CONFORMING TO THE “CONFIDENCE, AFFECTION AND COMPANIONABILITY” 
PaTTeERN, But DEFINITELY NOT CONFORMING TO THE “FAMILY DISCORD” PATTERN, WITH 
rue MeAN SCORES OF THE TOTAL RESIDENCE GROUPS 


Favorable Home 


Personality Environment group* Total Sample Diff. Diff 
Variable 
Mean Mean o diff 
N Score o N Score ¢ 
Farm Subjects 
Personal adjustment - - - -- 80 38.9 8.5 689 36.3 6.9 2.6 3.4 
Independence- -..----..----- 80 40.5 9.0 667 38.7 9.6 1.8 1.47 
Appreciation of home life._.. 80 86.3 4.2 670 83.1 10.1 3.2 5.25 
Small Town Subjects 
Personal adjustment... ..-..-.- 44 40.1 7.1 634 34.9 7.5 5.2 
Independence.........------ 44 42.0 7.9 604 37.3 10.0 4.7 3.74 
Appreciation of home life - - - - - 44 86.5 3.8 617 82.1 12.1 4.4 
City Subjects 
Personal adjustment___.----- 51 41.9 5.2 481 38.2 6.6 3.7 4.70 
Independence_......- enecese 50 44.8 7.2 489 40.8 9.5 4.0 3.62 
Appreciation of home life. - - - - 53 87.6 1.8 518 82.4 11.7 §.2 9.12 


* A “favorable home environment” is one which according to the youngsters’ answers to the ques- 
tionnaire, fits the pattern of ‘confidence, affection and companionability’’, but which, at the same time, 
lacks all the characteristics of the “‘family discord” pattern. 


TABLE 10 


COMPARISON OF THE MEAN PERSONALITY TEST SCORES OF GROUPS WHO CHARACTERIZED THEIR 
FamiLy Lire AS FITTING THE “FAMILY DISCORD” PATTERN BuT Not FITTING THE PATTERN 
or “CONFIDENCE, AFFECTION AND COMPANIONABILITY”, WITH THE MEAN SCORES OF THE 
ToTAL RESIDENCE GROUPS 


Unfavorable Home 


Personality Environment group* Total Sample Diff. Diff 
Variable 
Mean Mean o diff 
N Score N Score 

Farm Subjects 
Personal adjustment_______.- 29 32.4 8.5 689 36.3 6.9 —3.9 —2.44 
Independence__...........-- 27 37.5 11.0 667 38.7 9.6 —1.2 —0.61 
Appreciation of home life_ __ __ 28 72.8 16.3 670 83.1 10.1 —10.3 —3.32 

Small Town Subjects 

Personal adjustment________- 84 32.8 7.4 634 34.9 7.6 —2.1 —2.44 
Independence. 81 34.8 9.8 604 37.3 10.0 — 2.5 —2.16 
Appreciation of home life__ _ __ 83 73.2 18.5 617 82.1 12.1 — 8.9 —4.26 

City Subjects 
Personal adjustment... _______ 44 33.9 6.6 481 38.2 6.6 —4.3 —3.12 
Independence. 45 35.9 9.8 489 40.8 9.5 —4.9 —3.22 
Appreciation of home life. ____ 45 74.7 17.6 518 82.4 11.7 —10.7 —4.00 


_ *An “unfavorable home environment” is one which, according to the child’s answers to the ques- 
tionnaire, lacks all the characteristics of the pattern of ‘‘ confidence, affection and companionability” but 
which fits the “family discord” pattern. 


ment” groups. On the other hand, 29, 84 and _ called the “unfavorable home environment” 
45 of the farm, small town, and city subjects groups. 

respectively, indicated that their family situa- In order to show the combined relationship 
tions conformed to the unfavorable pattern of the two factors to personality adjustments 
and at the same time were the direct opposite in children, the mean test scores of the favor- 
‘o the favorable pattern. These groups are able, and of the unfavorable home environ- 


160 JOURNAL OF EXPERIMENTAL EDUCATION 


ment groups from each cultural setting were 
compared with the means for all the subjects 
from that setting. We wished to know, in 
other words, whether or not the favorable 
home environment subjects averaged signifi- 
cantly higher, and the unfavorable home envi- 
ronment subjects significantly lower in adjust- 
ment, independence in personal matters, and 
appreciation of home life as measured by our 
tests, than did the total cultural groups to 
which they belonged. In Tables 9 and 1o are 
given the comparisons involving the “favor- 
able” and the “unfavorable” groups respec- 
tively. 

All of the differences were in the expected 
direction and with the exception of those in 
“independence” in the farm subjects, were 
quite reliable. In the case of the favorable 
home environment groups, the critical ratios 
of the eight significant differences ranged from 
3.41 for personal adjustment in the farm sub- 


PERSONAL INDEPENDENCE APPRECIATION 
ADJUSTMENT (SELF-RELIANCE) OF HOME LIFE 
aa 
he 
4} 
ae —_ 3 
be 0 
° 
~ 


Fig. 1. Showing the deviations, in terms of 
sigma units (aig) of the mean scores in 
adjustment, independence and home life appre- 
ciation of the “favorable”, and of the “unfav- 


orable” home environment ups, from the 
mean scores of the total veslianes 


groups. 


[Vol. 8, No. 2 


jects, to 9.12 for appreciation of home life jp 
the city subjects. In the case of the unfayor. 
able groups, the critical ratios of the eigh 
significant differences ranged from —2.15 {o; 
independence in the town subjects, to —4.2¥ 
for appreciation of home life in the tow 
subjects. 

The extent to which the favorable, and the 
unfavorable home environmental groups o{ 
each cultural setting were differentiated from 
the total groups is shown graphically jn 
Figure 1 in terms of the critical ratios of the 
differences. The variable most closely associ- 
ated with the character of the home environ. 
ment, as might be expected, was the child’s 
attitude toward his home life. The variable 
which was associated to the smallest extent 
with the character of the home environment 
was independence in meeting personal prob- 
lems. The adequacy of the child’s adjustment 
to life in general as measured by Maller’s in- 
ventory, was very definitely and reliably asso- 
ciated with the character of the home environ- 
ment. The two common family life patterns 
here investigated are thus shown to be impor- 
tant factors of the home environment when 
judged in terms of their relationship to per- 
sonality adjustments in children. 


REFERENCES 


1. Chesire, L., Saffir, M. & Thurstone, L. L. 
Computing Diagrams for the Tetrachoric 
Correlation Coefficient. Chicago: Univ. 
Chicago Bookstore, 1933. 

2. Guilford, J. P. Psychometric Methods. 
New York: McGraw Hill, 1936. 

3. Maller, J. B. The CASE inventory. New 
York: Bur. Publ., Teach. Coll., Columbia 
Univ., 1935. 

4. ——. The CASE inventory, manuel of 
directions (Mimeographed). New York: 
Bur. Publ., Teach. Coll., Columbia Univ. 
1935. 

5. Stott, L. H. An analytical study of self- 
reliance. J. Psychol., 1938, 5, 107-118. 

6. ——. The relation of certain factors in 
farm family life to personality develop- 
ment in adolescents. Nebr. Agr. Exp. Sta. 
Res. Bull. 106, 1939. 

7. ——. Personality development in farm, 
small town and city adolescents. Nebr. 
Agr. Exp. Sta. Res. Bull. 114, 1939. _ 

. Thurstone, L. L. The Vectors of Mind. 
Chicago: Univ. Chicago Press, 1935. 


co 


SEGREGATION AS A FACTOR IN THE RACIAL 
IDENTIFICATION OF NEGRO PRE-SCHOOL CHILDREN: 
A PRELIMINARY REPORT 


KENNETH B, CLARK 
Columbia University 
and 
Mamie K. CLARK 
Howard University 


In conjunction with an investigation (1, 2) 
of racial identification of Negro pre-school 
' children as indicative of a phase in the de- 
velopment of consciousness of self, some 
' Negro children from mixed New York nurs- 
ery schools were compared with the main 
group of Negro children from segregated 
' Washington, D. C. nursery schools. It was 
believed that such a comparison would give 
_ an indication of the possible effects of segre- 
gation as a factor affecting the problem 
investigated. 
' In the segregated Washington nursery 

schools the children and personnel were all 
Negroes. In the mixed New York nursery 
schools there were two sub-groups: (1) Negro 
subjects from a nursery school with all Negro 
children, some Negro teachers, one White 
teacher and a White cook. This group is clas- 
sified as a semi-segregated group in view of 
the fact that the children, all of whom were 
colored, came in contact with two White 
members of the staff. (2) Negro subjects 
from a nursery school containing both White 
and Negro children and White personnel. 
This group is classified as the mixed group 
proper. 

In this report the results obtained from the 
semi-segregated and mixed New York groups 
will be presented and compared with results 
obtained from the segregated Washington 
group. Data from the latter group have 
| already been published (1) and will there- 
fore not be presented here in detail. 

A modification of the Horowitz picture 
technique (3) was used. Children were asked 
to show the experimenter which one of a 
series of drawings of white and colored boys, 
animals and a clown they considered to be 
themselves. The Washington children were 
tested by the experimenter whereas the New 


York children were tested by their teachers 
who had full directions as to procedure. 

Only three- and four-year-old subjects 
were compared among the three groups since 
there were no five year olds in the New York 
schools. It should also be noted that the 
number of cases in the semi-segregated (19) 
and in the mixed group (21) are too few to 
permit definite conclusions. Since there were 
100 children in the segregated group, how- 
ever, the results may serve as a basis for the 
observation of some general trends in the 
New York groups which might serve as a 
basis for future work on the problem. 


RESULTS 


Total Group Responses.—There were no 
clear cut differences between the semi- 
segregated subjects (New York Group 1) 
and the segregated subjects in choices of col- 
ored or white boys or irrelevant pictures on 
the picture series. Both groups made more 
choices of the colored boy than of the white 
boy. The semi-segregated group made 30 
choices of colored boy, 22 choices of white 
boy and 5 irrelevant choices. Subjects in the 
mixed group (New York Group 2), however, 
made an equal number of choices of white 
and colored boy (25 each) and 13 irrelevant 
choices. As compared with the segregated 
(7.5%) and the semi-segregated (8.7%), the 
percentage of irrelevant choices in the mixed 
group (20.6%) is high. 

In general, the tendency to identify with 
either the colored or the white boy seems to 
approximate a chance frequency among those 
Negro children in nursery schools where there 
are both white and colored children, while a 
trend toward identifying with the colored boy 
is more pronounced in the Negro children in 
the semi-segregated group and even more so 
in the all-Negro nursery schools. 


161 


162 


Age Level Responses.—In the segregated 
group there was an increase in the percentage 
of difference between choices of colored and 
white boys, in favor of the colored boy, from 
the three to the four year level. In spite of 
the few cases, the semi-segregated group 
shows this same trend (difference between 
percents of choices of white and colored boy 
was 10.59% at the three year level and 22.3% 
at the four year level, both in favor of the 
colored boy). In both these groups the per- 
centage of choices of colored boy increases 
from the three year to the four year level, 
while the percentage of choices of white boy 
remains approximately the same (for semi- 
segregated group choices of colored boy 
48.7% to 61.1% from three to four year level 
respectively; white boy 38.2% to 38.8%). 


The mixed group tends to resemble the 
segregated group in that the difference be- 
tween percents of choices of white and col- 
ored boy increases with age in favor of the 
colored boy (—8.3% at the three year level 
to 11.1% at the four year level). But where- 
as the increase in choice of colored boy over 
white boy in the segregated and semi- 
segregated groups is at the expense of choices 
of less relevant pictures of animals and clown, 
this does not appear to be true of the mixed 
school subjects. Beginning at the four year 
level, the segregated and semi-segregated 
children cease to identify themselves in terms 
of the animals or the clown and consistently 
identify in terms of either the colored or white 
boys with a greater trend toward choice of 
colored boy. At the four year level, although 
children in the mixed school tend to identify 
themselves more with the colored boy than 
with the white boy, some of these four-year- 
olds still identify themselves with irrelevant 
pictures. Here again the choices of mixed 
children at the four year level are rather con- 
fused as compared with the clean cut drop- 
ping off of irrelevant responses by the segre- 
gated and semi-segregated children. These 
results suggest that the level in development 
of self-awareness, where identification of self 
is in terms of distinct persons, which was 
attained by the segregated and _ semi- 
segregated Negro children at the end of three 
years of age, was not attained at the end of 
the three year level by Negro children in 
mixed school situations. 


This general tendency of the Negro sub- 
jects in the mixed nursery schools to deviate 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No, 2 


from the trends of the all-Negro nursery 
school subjects and the semi-segregated syb. 
jects suggests the possibility that racial ident. 
ification as a phase of ego-consciousness 
develops camparatively later in the mixed 
nursery school subjects and tends to be more 
confused in its expressions. 


When age level groups were further syb. 
divided into male and female the few cases 
in each sex classification showed no different 
trends and are therefore not compared with 
sex-difference results from the segregated 
children. 


Skin Color.—Three- and four-year-old 
children in each of the three groups were sub- 
divided, on the basis of skin color, into light, 
medium and dark. Due to the small number 
of New York children the three and four year 
children were combined for analysis of these 
results. 


Whereas light children in the segregated 
group made more choices of the white boy 
than of the colored boy, choices of light chil- 
dren in both the semi-segregated and mixed 
groups approximated chance (light seni- 
segregated: 11 choices of white boy and 10 
of colored boy; light mixed: 5 choices of 
white boy and 4 of colored boy). 


While both medium and dark children in 
the segregated group made more choices of 
colored boy than of white boy, and medium 
and dark children in the semi-segregated 
group followed the same trend, choices of 
medium and dark children in the mixed group 
approximated chance (semi-segregated group: 
medium-colored boy 13 choices, white boy 
10; and dark-colored boy 6 choices, white 
boy 2; mixed group: medium-colored boy 
11 choices, white boy 10; and dark-colored 
boy 9 choices, white boy 11). 

In the segregated group identification o! 
light children more with pictures of the white 
boy and identification of medium and dark 
children more with pictures of the colored 
boy had suggested a level of development 10 
self awareness where racial identification 0! 
self is in terms of one’s own skin color.(2) 
In the semi-segregated group, results suggest 
that this might also be true of these children 
were there more cases. It is difficult to 4 
sume any indication of the same trend for 
children in the mixed group since light, 
medium and dark children make choices 
approximating chance. 


December, 1939] 


It is impossible to draw any definite con- 
clusions from these results in the light of the 
small number of cases in each classification 
of the mixed and semi-segregated groups. 
The data indicate, however, that the semi- 

egated New York group is similar to the 
segregated Washington group in reference to 
the relationship of racial identifications to the 
skin color of the subjects; while the mixed 
New York group shows no such relationship, 
since racial identifications of each classifica- 
tion of skin color approximates chance. 


DISCUSSION 


In an analysis of these preliminary data, 
the fact that the mixed New York subjects 
deviated from the general trend of responses 
found in both the segregated Washington 
subjects and the semi-segregated New York 
subjects stand out clearly. If this finding is 
substantiated by further investigation, it will 
be necessary to interpret the problem of con- 
sciousness of self and racial identification not 
only in the light of such personal factors as 
age, intelligence and skin color of the subject, 
but also in terms of the nature of the envi- 
ronment in which the subject is tested. 


The data indicate that children in the 
mixed nursery school appear to develop a con- 
sciousness of self and concomitant racial 
identification at a later chronological period 
than either the wholly segregated children or 
the semi-segregated children. 


Even this indicated analysis in terms of a 
retardation is an assumption which will need 
further investigation, in that it is difficult to 
know at present whether these children in 
mixed schools follow the same sequence of 
development of ego and racial awareness as 
was found in the segregated Washington 
children. Nevertheless, for purposes of com- 
parison a general similarity in pattern of 
development will be assumed. This retarda- 
tion and seeming confusion of identifications 
of these subjects in the mixed nursery school 
Suggest that other factors not present in the 
semi-segregated or segregated group situation 
are operative in modifying the expressions of 
this function. 

_The most obvious factor seemingly respon- 
sible for this retardation and chance racial 
identifications of the subjects in the mixed 
nursery school is the presence of white chil- 
dren of their own age in the same nursery 
school. This factor seems at present to be 


NEGRO PRE-SCHOOL CHILDREN 


163 


the determinant of the deviation in responses 
of this mixed group from responses of the 
other two groups. The factor of intelligence 
will have to be investigated before this con- 
clusion can definitely be stated. 


This difference in the physical character- 
istics of individuals making up the environ- 
ment for the developing child seems at 
present sufficient to change markedly the 
operation of the dynamics of self-awareness 
and racial identification as found in a study 
of segregated Negro children. 


The differentials of age in the mixed group 
do not seem to operate in the same manner 
as in the segregated and semi-segregated 
groups. Children in the mixed groups made 
identifications in terms of irrelevant pictures 
at the four year level, while the other groups 
ceased to identify themselves with irrelevant 
pictures at the end of the three-year level. 


Skin color differentials also seem to break 
down with a change in the environment of 
the subjects. In the segregated group racial 
identifications were made for the most part 
upon the basis of the skin color of the sub- 
jects. This same trend was found to be oper- 
ative in the semi-segregated group. However, 
there was no tendency whatsoever toward 
this trend in the mixed group. This suggests 
the possibility that the racial identifications 
of children in the mixed group were to a large 
extent determined by the physical character- 
istics of those in their immediate environ- 
ment. It is a question, to be settled by fur- 
ther work, whether this social factor has not 
gained priority over the factor of their own 
skin color as a determinant of the racial 
identifications of these Negro children. 


REFERENCES 


1. Clark, K. B. and Clark, M. K. “The 
Development of Consciousness of Self and 
the Emergence of Racial Identification in 
Negro Preschool Children.” Journal of 
Social Psychology, 10: 591-599, 1939. 


2. Clark, K. B. and Clark, M. K. “Skin 
Color as a Factor in Racial Identification 
of Negro Preschool Children.” In press. 
Journal of Social Psychology, Jan. 1940. 


3. Horowitz, R. E. “Racial Aspects of Self 
Identification in Nursery School Chil- 
dren.” Journal of Psychology, 7: 91-99, 
1939. 


THE INFLUENCE OF NURSERY SCHOOL EDUCATION 
UPON BEHAVIOR MATURITY 


WALTHER JOEL 
Los Angeles City College 


The effect of nursery school education 
upon test-intelligence has been studied more 
than once. The rise in IQ reported in these 
studies has been ascribed to the stimulating 
environment of the nursery school which pro- 
vides opportunities for manyfold experiences 
essential to mental development. 

This paper presents some data related to 
the effect of nursery school education upon a 
different aspect of the child’s personality for 
which the author has proposed the term Be- 
havior Maturity. This term is defined as 
grown-upness, the opposite of childishness, 
or more specifically as the relative degree of 
independence, self-control, and social attitude 
reached. 

The results were obtained by means of a 
rating scale more fully described elsewhere.’ 
This scale consists of twenty items, each 
describing a different situation with five pos- 
sible types of reaction for each. These twenty 
items are grouped in three sections under ‘the 
headings of “Routine Habits”, “Emotional 
Maturity”, and “Social Maturity”. The scale 
was found to correlate .65 + .o2 with CA 
(N= 467) and .036 with IQ (N= 88). 
Hence it may be assumed to measure some 
aspect of the child’s behavior which varies 
with chronological age and which is not 
related to intelligence. 

The subjects. were 425 children who, at 
the time of rating, had been attending 
nursery school from one to thirty-six months, 


1 Jo#l, Walther, “Behavior Maturity of Children of Nursery 
School Age’, Child Development, 7, 1936, 189-199. 


with a median attendance of nine months. 
Children with less than one month attend- 
ance were not rated; thus the teachers had 
a chance to become familiar with the child 
before rating and the child had time to over- 
come the difficulties of initial adjustment. 


The longer the children have attended 
nursery school, the older they tend, of course, 
to be. To determine the true effect of nursery 
school attendance upon Behavior Maturity, 
it is, therefore, necessary to eliminate this 
age influence. For this purpose the following 
procedure was adopted. The children were 
divided into age groups of ten months each. 
Within each age group, they were ranked 
according to time spent in nursery school. 
Then each age group was divided into two 
approximately equal parts, one comprising 
those children who had spent more (b), the 
other those who had spent less (a), time in 
the nursery school than the median of that 
age group. Then all a-groups were combined 
into a total group A, and the b-groups were 
combined to make up the total group B. In 
this way group A and group B were well 
matched for CA and any differences that 
might be found in the Behavior Maturity of 
the two groups cannot be ascribed to differ- 
ences in chronological age. In fact, an inspec- 
tion of Table I will show that the average 
CA of group B, with more months spent in 
nursery school, is slightly lower than that o/ 
group A, as there are a few more of the 
younger children in B and a few less of the 
older ones. 


TABLE I 
Total Group A Group B F 
N CA (months) N  Monthsin Nursery School 
20-29 13 1-2 3 and more 18 
30-39 $7 1-4 5 and more 41 
40-49 66 1-6 7 and more 69 
50-59 61 1-10 11 and more 59 
TT RPA 60-69 28 1-9 10 and more 2 
PA ITI 70-79 4 1-9 10 and more 4 
425 209 216 


December, 1939] 


Age norms for the total score and for the 
three sub-scores had been established. By 
dividing the Behavior Maturity Age derived 
from the norms for each child by his CA, the 
Behavior Maturity Index was obtained. In 
the same way quotients on the three sub- 
scores were computed for each child. As the 
norms had been adjusted in such a way that 
the median quotient for any CA group was 
close to or equal to roo, the Behavior 
Maturity Index and the three sub-quotients 
express the respective scores independent of 
chronological age. 

The results are shown in Table II. There 
is a significant difference in Behavior Matur- 
ity Index, the children who spent more time 
in nursery school having the higher average 


NURSERY SCHOOL EDUCATION 


165 


Index. Of the three sections of the scale, the 
items indicating social maturity contribute 
most to this difference. There are 99.75 
chances in 100 that the difference in emo- 
tional maturity is likewise significant. For 
the routine habits section of the scale the 
chances are 96 in 100 that the true difference 
between the two group averages is greater 
than zero. 

These results show that longer nursery 
school attendance is associated with greater 
Behavior Maturity as measured by the “Be- 
havior Maturity Rating Scale for Nursery 
School Children”. Such a relation would in- 
dicate that the nursery school successfully 
influences the child to grow up emotionally 
and socially. 


TABLE II 
Chances in 
Difference 100 that 
Mean B—MeanA_ oe difference C.R. true diff>0 
Routine Habits = 5.97 3.50 1.71 96 
Emotional Mat. Quotient ___ 9.15 3.21 2.85 99.75 
Social Maturity Quotient -__ 17.34 2.89 6.00 100 
Behavior Maturity Index ___ 11.27 2.54 4.44 100 


AN EVALUATION OF ASPECTS OF THE ACTIVITY PROGRAM 
IN THE NEW YORK CITY PUBLIC ELEMENTARY SCHOOLS 


ArtHUR T. JERSILD, RoBERT L. THORNDIKE AND BERNARD COLDMAN 
Teachers College, Columbia University 
AND 
Joun J. Lortus 
Board of Education, New York City Public Schools 


PART I 
BACKGROUND OF THE STUDY 


This report represents an interim summary 
of findings obtained in an extensive research 
program* designed to measure certain out- 
comes of the “activity” program which was 
inaugurated in 1935 by the Board of Educa- 
tion of New York City in a number of ele- 
mentary schools.** The present report covers 
the period from the Spring of 1937 to, and 
including, the Spring semester of 1939. It in- 
cludes hitherto unpublished results as well as 
a recapitulation of earlier articles that have 
been published from time to time.(1—5) 

The Board of Education of the City of 
New York decided, in 1935, to embark upon 
a program of experimentation with newer 
and more “progressive” educational methods. 
Typical schools throughout the city were 
designated as “activity schools,” and were en- 
couraged to develop a program based upon 
units of pupil activity as a substitute for, and 
supplement to, the traditional textbook learn- 
ings. 

From the first, Dr. John J. Loftus, Assist- 
ant Superintendent of Schools, in charge of 
the Activity Program, planned to include in 
this experiment in curriculum and methodol- 
ogy, a continuing experimental appraisal of a 
wide variety of. outcomes. These appraisals 


* This study was conducted with the aid of workers affili- 
ated with the Work Projects Administration, Research and 
Clerical Department, Official Project No. 665-97-3-6, Sub- 
Project No. 1, in cooperation with Teachers College, Colum- 
bia University, New York City. 

** The authors gratefully acknowledge the helpfulness and 
consideration shown by the principals and teachers in the 
schools included in the study, and the aid rendered by the 
Advisory Committee on Evaluation of the Activity Program in 
New York City Schools. The authors are grateful to Mr. 
Joseph Rechetnick, who was the Project Supervisor in the 
early stages of the study and who continued to f 
of his time, for helpful counsel after he withdrew from the 

he study was originally organized by Dr. J. 
Wrightstone in consultation with the Advisory 
Dr. Wrightstone also devised several of the procedures 
in the investigation, and directed the project until 1937, when 
other responsibilities made it necessary for him to withdraw. 
The instruments of measurement pre in the early stages 
of the study by Dr. William A. McCall and Dr. John P. 
Herring are also indicated in the main body of the report. 
The authors are deeply indebted to the anonymous 
workers who carried the day by day burdens of the study. 


were devised in consultation with the Advis- 
ory Committee for the Activity Program. 
Members of the Advisory Committee who 
were actively involved in the program of 
evaluation included Dr. William A. McCall, 
Professor of Education, Teachers College 
Dr. John P. Herring, Research Specialist, 
and Dr. J. Wayne Wrightstone, then Research 
Associate at Teachers College, Columbia Uni- 
versity, and now Assistant Professor of Edu- 
cation, Bureau of Educational Research, 
Ohio State University. 

The Advisory Committee judged from the 
first that evaluation of the activity experi- 
ment must be very broad in its scope if the 
results were to be of any value in guiding the 
future policy of the schools of New York 
City. It was planned to endeavor in the re- 
search program to measure as many desir- 
able outcomes of education as was feasible. 
A large part of the energy of those concerned 
with the evaluation was devoted, therefore, 
to the development of instruments for meas- 
uring the broader and less tangible outcomes 
of an educational enterprise. The research 
instruments that were devised included: 
(1) a broad test of achievement prepared by 
Drs. McCall and Herring, (2) several tests 
of study skills, social attitudes, and individual 
adjustment prepared by Dr. Wrightstone, and 
(3) methods of observation of classroom 
activities developed by Dr. Wrightstone. 
These measures will be described later in 
greater detail. In addition, results were avail- 
able from the Modern School Achievement 
Test administered at various stages of the 
investigation. 


PART II 
SUBJECTS AND PROCEDURE 
Groups STUDIED 
The activity experiment was initiated in 4 
total of seventy schools, distributed through 


out the various sections of the City of New 
York. These schools were selected upon the 


166 


December, 1939) 


basis of (1) typicality of school and repre- 
sentativeness of school population for the 
district involved and (2) favorable sentiment 
toward activity practices. Since no teacher 
was to be compelled to participate in the 
program, it was obviously desirable to develop 
the program in schools in which many of the 
teachers were interested and willing to par- 
ticipate. 

It was not deemed feasible to carry out an 
adequate evaluation experiment upon a scale 
so large as to include all of these 70 schools. 
Most of the research involved in the evalua- 
tion, therefore, was confined to eight pairs of 
schools. Each of eight activity schools was 
matched with a control school. Each control 
school was selected from the same neighbor- 
hood as its paired activity school. Further- 
more, control schools were selected which 
were, in the judgment of authorities at the 
Board of Education, equivalent to the cor- 
responding activity schools in the national or 
racial backgrounds and in _ socio-economic 
status of the pupils. In order to have a fur- 
ther objective check upon the adequacy of 
the matching, a test of intelligence was pre- 
pared by Drs. McCall and Herring. In an 
earlier report of this research enterprise,(6) 
groups of children in the activity and the 
control schools were found to be practically 
equal in intelligence, on the average, with a 
difference of one-tenth of a school grade in 
favor of the control group. 

The evaluation project included a testing 
program and a program of observation of 
classroom activities. The testing program in- 
cluded all of the children who were in grades 
four, five and six of the eight pairs of schools 
at the time the tests were administered. The 
observation of classroom activities, however, 
was limited to a selection of class groups. 
Limitations in the number of observers and 
the number of classes which it was possible 
for one observer to cover, made it necessary 
at the beginning to restrict this phase of the 


EVALUATION OF THE ACTIVITY PROGRAM 


167 


experiment to four classes in each activity 
school and its paired control school. The 
observations were originally based, then, 
upon a core of 32 classes in each group. Dur- 
ing certain semesters, increases in the per- 
sonnel of the project made it possible to ex- 
tend the observations to one or two addi- 
tional pairs of schools. Table I gives the per- 
tinent information concerning the classes 
studied for each of the semesters during 
which observations were carried out. 

In many of the analyses of test results to 
be reported in this study, a smaller sample 
has been taken from the total group of indi- 
viduals tested. This sample is composed, in 
each case, of individually matched pairs of 
activity and control children, matched on the 
basis of sex, chronological age, and intelli- 
gence test score. The maximum difference 
allowed between members of a pair was 6 
months in chronological age and ro points in 
McCall Intelligence Test score. 

The schools also were matched quite 
closely as to general type of neighborhood, 
(East Side, Harlem, Bronx and Queens resi- 
dential, etc.) and as to the nationalities rep- 
resented by the children’s forebears. 


INSTRUMENTS USED 


As indicated above, an effort was made in 
this experiment to tap a wide variety of edu- 
cational outcomes. Many of the devices tried 
out were experimental in character, and some 
were developed expressly for this program of 
evaluation. Though most of them have been 
reported elsewhere (see references at end of 
study), they are not generally known, and so 
it seems appropriate to present the instru- 
ments and techniques in fairly full detail. 

In Table II are listed all the measures 
which were obtained at any time during the 
investigation. The table also indicates at 
what time or times various instruments were 
used. Some, as can be seen from the table, 
were repeated during each of the five semes- 


TABLE I 
CLASSES OBSERVED 

First Second Third Fourth Fifth Sixth 
grade grade e grade grade e 

' Act. Cont. Act. Cont. Act. Cont. Act. Cont. Act. Cont. Act. Cont. 
Spring 1937 _____ 11 11 13 «13 8 68 
inter 19387 _____ 2 2 9 10 14 14 
Spring 1988 _... 2 2 6 6 10 11 14 13 12 12 
inter 1938 _____ 8 8 10 10 9 9 
Spring 1939 _____ 3 8 8 9 9 


168 JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 8, No. 2 
TABLE II 
DaTA ASSEMBLED DURING EVALUATION EXPERIMENT 
Term ending 
June Jan. June Jan. June 
1937 1938 1938 1939 1939 


Coded observations of frequency of social 


x x x x x 
Anecdotal observations of nature of social 

x x x x x 
Test of working skills in social studies _-__-_.....-.- x x x 
Test of explaining facts in social studies____.______ x x x 
Test of applying generalizations to social 

Test on social beliefs and attitudes_____.___..___- x x x 
Test of personal and social adjustment___.___..--- x x x 
Com Achievement Test 

x x x 


School Practices Questionnaire 
(McCall, Herring, Loftus).............-.-....- x x x x x 
Modern School Achievement Test__..........---- x x x 
Intelligence Test (McCall)* 
* The intelligence test was given to children in grades 4, 5, and 6 of the schools studied in January, 
1937. From then on, it was given each semester in the 4A grades, in order to provide one test of each 


pupil in the upper grades. 


B. Critical activities 
1. Criticizing (praising or challenging) 
work of others by bringing out good 


ters for which data are now at hand. Others 
were administered only once or twice. This 
variation was due in part to the direction of 


emphasis in the experiment—which was upon 
the social performance factors—and in part 
to the opportunistic use of any material 
which became available through regular _test- 
ing carried on in the schools outside of the 
immediate scope of this investigation. 


“CopEep OBSERVATIONS” 


The ‘coded observations” were designed to 
provide a time-sampling of certain types of 
classroom behavior. It is obvious that there 
is practically no limit to the varieties of be- 
havior which an observer in a classroom 
might note and record. In order to make the 
observations practicable, it was necessary to 
be selective and to focus upon certain cate- 
gories of behavior to the exclusion of others. 
The categories of behavior which were noted, 
each of which was designated by its appro- 
priate symbol in the observer’s record, follow, 
with a few illustrative items: 


A. Co-operative activities 
1. Helping other pupils or teacher with 
their problems or projects. 
2. Offering objects (book, chair, pencil, 
tool, etc.) to teacher, pupil or visitor. 
3. Responding quickly to requests for 
quiet, materials, etc. 


points, suggesting improvements. 
2. By defending points of view. 


3. By asking pertinent questions of 


teacher or other pupils. 


C. Experimental activities 


1. Trying out new things, putting things 
into new combinations, as in manual, 
mechanical, or fine arts, or in social 
studies, natural sciences, mathematics, 
etc. 

2. Creating or constructing an original 
poem, art form or subject, a melody, 
story, chart, diagram, replica, miniature 
building, instrument, etc. 


D. Leadership activities 


1. Organizing, directing, or controlling 
new combinations of persons and 
things (e.g., setting up plan of proce- 
dure, acting as group chairman, etc.). 


E. Recitational activities 


1. Responding on request, largely from 
memory, to direct questioning on & 
signed textbook or subject matter. 

2. Voluntarily responding, largely from 
memory, to direct questioning on 3 
signed textbook or subject matter. 


December, 1939] 


F. Self-Initiated activities 
1. Bringing voluntarily contributions, 
(clippings, exhibits, books, charts, etc.) 

for school activities. 

_ Submitting voluntarily and_ orally 
data or information gained outside of 
school (observations, trips to buildings, 
factories, etc.). 

3. Presenting a report on a self-directed 

investigation. 

4. Suggesting methods, materials, activ- 

ities, etc. for developing a project. 


G. Work-Spirit activities (Negative activities) 
1. Completing work sooner than others 
and not using time wisely. 

. Depending upon unnecessary help 
while working. 

3. Not concentrating deeply upon work 
which requires close attention. 

4. Not working as efficiently when the 
teacher leaves the room or is not 
nearby. 

5. Carrying on conversation with neigh- 
bors when attention should be given 
elsewhere. 

6. Touching or borrowing property of 
other children without their permission. 

7. Letting materials or paper remain on 
the floor. 

8. Written work not neatly arranged, not 
legible or free from blots and eraser 
marks. 


These categories were discussed with the 
corps of observers, and the observers were 
given preliminary training in recording the 
behaviors in classroom situations. Problems 
of definition were considered during periodic 
conferences. 


In gathering the data, the observers car- 
ried out a series of half-hour observations in 
each class. Upon entering a given class, the 
observer established herself unobtrusively in 
4 position in the room from which she could 
get a good overview of the room as a whole. 
The observer endeavored to record, by means 
of the appropriate coded symbol, each occur- 
rence of any of the activities falling within 
the scope of any of the categories defined 
above. The observations of each class were 
repeated, at intervals of one to two weeks, a 
number of times during any given semester. 
The number of periods spent in any class 


EVALUATION OF THE ACTIVITY PROGRAM 


169 


varied from six to fourteen; all of the main 
tables of results beginning with the winter 
term of 1937-38 are based upon at least 
twelve periods of observation in each class- 
room. 


In treating the results, the observations 
from the several periods were thrown to- 
gether, so that a total frequency of each of 
the behavior categories for each of the indi- 
viduals in the class was obtained. In order 
to get a score for each class which would be 
independent of the size of the class and the 
number of periods of observation, the follow- 
ing procedure was used: Add together the 
scores of the individual members in the class 
in a particular category, and divide by the 
number of pupils in the class. This gives the 
average number of behaviors within that 
category per individual in that class. Then 
divide by the number of periods of observa- 
tion. This gives the average number of be- 
haviors per person per period of observa- 
tion. This was then multiplied by 100, yield- 
ing a hypothetical average number of items 
of that particular behavior per person per 
hundred periods of observation. 


“ANECDOTAL OBSERVATIONS” 


The coded observations described above 
provided a measure of the frequency of cer- 
tain performance factors in the classrooms 
studied. However, it was recognized that 
there might be other differences than those of 
mere number. A counting of frequencies of 
cooperative acts, for example, might not give 
an adequate comparison of cooperation under 
the two educational regimes. It was deter- 
mined to try to supplement these observations 
of frequency with other measures which 
would take account of differences in quality 
of behavior items observed. 

The measures of quality were obtained 
from anecdotal diary records. During certain 
observation periods in the classrooms, the 
observers endeavored to record, in as much 
detail as possible, as many of the instances 
of the above several categories of behavior 
as they were able to observe and record. 
These raw notes, in some cases taken down 
in shorthand, were subsequently reorganized 
by the observer, and all the items for a par- 
ticular child were listed together under the 
appropriate categories. The following ex- 
cerpts illustrate the nature of this material: 


170 


Samples of Self-Initiated Activities— 
Sixth Grade 


John J.: November 14—Made a report on 
the occupations of Chinese People; gave data 
gathered from several textbooks. November 
19—Brought in newspaper clippings on con- 
ditions in China, which he read to the class. 
November 27—Showed the class pictures of 
two Chinese boys and a picture of a Chinese 
boat; described clothing of Chinese boys. 
December 6—Volunteered to act as leader of 
a class group to investigate type of govern- 
ment China has today. 

Mary S.: November 9 — Helped to make 
a group report on Chinese customs. Decem- 
ber 16—Brought to class a short clipping 
about the China Clipper and read it to the 
class. 


Samples of Co-operative Activities— 
Fourth Grade 


James M.: May 21—Helped to clear off 
and arrange the classroom library table. 
May 22—Helped keep a check on the boys 
during trip. May 23—-Brought in bundle of 
twelve Child Life Magazines. May 24—Vol- 
unteered to remain and clear away materials. 
May 28—Loaned paints and pencil to an- 
other pupil. 

Frank L.: May 21—Responded to request 
for quiet and went to help another pupil put 
away materials. May 22—Helped to clean 
Lorraine’s desk. 

Susan N.: May 21—Stayed late to help 
clean up. May 23—Volunteered to help 
empty basket, wash boards, sweep and to 
help class housekeeper. May 24—Assumed 
housekeeper’s duty when regular housekeeper 
was absent. May 28—Brought broom in at 
lunch time and swept room which was very 
untidy. 

The record for each child for each of the 
behavior categories was then rated, using the 
method of equal-appearing-intervals. An 
eleven point scale of steps from o to 10 was 
used, each step being assumed to be equal in 
the eyes of the judges doing the rating. Four 
judges rated each record. When a group of 
records was to be rated, the procedure was 
approximately as follows: The judges were 
instructed to read through rapidly a large 
sampling of records from different individuals 
and groups, in order to establish a frame of 
reference in terms of which to rate. Then the 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No, 2 


judge went through one record after another 
assigning the numerical value which seemed 
to him most accurately to place this indi. 
vidual on an equally-spaced scale for the 
measure in question. 

The score for an individual was the ayer. 
age of the ratings of the several judges. 
Where no instances of the behavior category 
were recorded for an individual, he automat. 
ically received a rating of zero. The rating 
for a class was the average of the ratings oj 
the component members. The raters were in. 
structed to base their ratings primarily upon 
the quality of the behavior shown, although 
frequency also entered into the rating. The 
rating an individual received depended in 
part on the judged excellence of his behavior, 
but also in part upon the frequency of the 
behavior in question. Just what weight was 
actually given to these two aspects of the 
records by the raters, it is not possible to say. 


TEstTs 


Current Affairs Test.*—An objective test 
was prepared covering knowledge and in(or- 
mation concerning contemporary persons and 
events in the fields of politics, art, science, 
sports and the like. This test was intended 
to measure the pupil’s awareness of important 
persons and events in these fields. 

Working Skills in the Social Studies. —This 
test consisted of three parts: the first requires 
the pupil to extract information from tables 
and graphs, the second requires the pupil to 
tell where to locate certain types of informa- 
tion and the third requires the pupil to use 
an index. It is assumed that these perform 
ances are a measure of desirable abilities in- 
volved in the location and _ utilization 0! 
necessary information. 

Explaining Facts in the Social Studies.— 
This test requires the pupil to judge whether 
certain inferences do or do not follow from 
given data. No recall of facts is required, 
because all the necessary evidence is cot 
tained in material which is available to the 
student. His task is one of comprehension. 
inference and judgment. 


* This test, and the five immediately following, were ©* 


rinted copies of 


Facts in the Social Studies (Drawing Conclusions from a. 
and Applying Generalizations to Social Studies Eve 
(Applying General Facts). The names of the tests © put 
lished form are given in parentheses. 


structed by J. Wayne Wrightstone. 7 
forms of three of the tests, each bound together, 0 
obtained at the Bureau of Publications, Teachers (\* 
Columbia University, under the name “Test of Crit! 
Thinking in the Social Studies’. These tests are: Working 
~ 


December, 1 939) 


The evidence upon which the student must 
base his inference is tabular or graphical in 
some cases. In other cases it is verbal expo- 
sition. A representative sample of the latter 
would be this passage: 

From the forest trees comes most of the 
wood we use in making pencils, rulers, chairs, 
boats and buildings. We burn some wood in 
stoves and furnaces. Wood from the forest is 
also used in making paper. A well-kept forest 
makes the country beautiful. The forests also 
keep the rain from running off the soil, and 
thus hold a water supply for the small 
streams which flow into big rivers. 

The pupil indicates his judgment of certain 
interpretations of the material presented by 
marking the interpretation as follows: 


(+) Every statement which is true and 
can be proved by the facts stated. 

(O) Every statement which might be true 
but cannot be proved by the facts 
stated. 

(—) Every fact which is false as shown 
by the facts stated. 


The interpretations offered for the para- 
graph cited above, and the marking which 
would give a pupil a perfect score are as 
follows: 


a. Forests are useful only because 

they make a country beautiful__s5. (—) 
b. The United States has more for- 

ests than any other country___-6. (O) 
c. Forests supply us with many 

things we use in our daily life__7. (+-) 
d. Big rivers depend on the forests 

for some of their water supply__8. (+) 


Applying Generalizations to Social Studies 
Events—In this test the student is presented 
with a statement of certain facts or happen- 
ings. He is also supplied with five or six gen- 
eralizations, seme of which are related to the 
described fact or event and some of which 
have no bearing upon it. In the first form of 
the test, used in 1937, the student was re- 
quired to select and mark those generaliza- 
tions which were relevant to the described 
tact or event. In the form of the test used in 
1938 and 1939, the student was required to 
match the appropriate generalization with a 
set of statements which indicated the rela- 
tionship of the generalization to the preced- 
ing paragraph, 


EVALUATION OF THE ACTIVITY PROGRAM 


171 


A sample item from the first form of the 
test is as follows: 

In the eighteenth century, when there was 
a population of about 4,000,000, farming was 
very different from what it is today. Farms 
were very smal]. Most of the farming tools 
were made by hand. Farmers produced only 
enough to feed their own families. Today, 
with a population of 130,000,000, farmers 
use tractors and electrical machines in work- 
ing their farms. They can also produce 
enough to sell to other people. Why was this 
change brought about? 


1. Crops cannot be raised without the help 
of man. 
2. Good roads and harbors make it easier 
for the farmer to sell his products. 
3. Machines have made great changes in 
the methods of farming. 
. The purpose of government is to help ‘its 
people. 
5. When prices of farm products are high 
more people become farmers. 
6. When the population increases, the need 
for food increases. 


Items 3 and 6 were considered to relate to 
the events described, and so a pupil received 
full credit if he checked these two and no 
others. 

The same item in the revised form of the 
test would appear as follows: 


In the eighteenth century, when only a few 
million people lived in the United States, 
almost everyone lived on a farm. Farms were 
very small. Most of the farming tools were 
made by hand. Farmers produced only a 
small amount for the markets. Today, with a 
population of many millions, farmers use 
tractors and electrical machines in working 
large farms. They can now produce plenty to 
sell to other people. 


1. Crops cannot be 
raised without the 


2. Good roads and 
harbors make it 
easier for the 10. Explains how 
farmer to sell his farmers are able 

roducts. to produce s0 

8. Machines have much goods __ (3) 
made great li. Explains why 
changes in meth- farms are large 
ods of farming. (4) 

4. When the popula- 12. Explains why 
tion increases the farmers use trac- 
need for food in- ees 
creases. 


5. As industries 
grow larger, new- 
er and better ma- 
chines are used. 


172 


If the student marked his paper as indi- 
cated, 3, 4, and 5, he received full credit. The 
score was taken as the total number of cor- 
rect matchings. 

Since the technique of scoring and the pos- 
sible score on these two forms was quite dif- 
ferent, no direct comparison can be made be- 
tween scores at the 1937 testing and scores at 
later testings. 

Social Beliefs and Attitudes.—In this test 
the student was required to accept or reject 
(mark as true or as false) a number of state- 
ments, rather varied in nature, of which some 
were supposed to represent a progressive and 
some a reactionary point of view on contro- 
versial issues. The following is a sample of 
the statements: 


Eskimos love their homes just as we love 
ours. 

The United States should spend more of its 
money for warships. 

It is all right to pay low wages to workers 
in shops. 

The government does a good thing when it 
gives us free schools. 

Everyone should obey a law whether he 
likes it or not. 


A key was prepared, in terms of which a 
composite score was given to each individual, 
presumably indicating the extent to which he 
takes a generally progressive point of view. 

Personality Trait Indicator —This was a 
paper-and-pencil inventory, in which the 
pupil was required to assay his symptoms, 
actions, preferences, etc. Items were included 
which were deemed symptomatic of personal 
or of social adjustment. The following sample 
gives some idea of the items included: 


Do you like to speak rather than 


to write a message? ASR 
Do you believe in daydreams when 
things go wrong? ASR 


Are you afraid during lightning 
and thunder storms? AS 

Do you stutter or stammer when 
you speak? 

Do your teachers find fault with 
you or your work? 

Do you talk over your troubles or 
worries with your parents? 


ASR 
ASR 
ASR 


The student indicated (by encircling the 
appropriate symbol) whether he always or 
sometimes or rarely exhibited the behavior in 


JOURNAL OF EXPERIMENTAL EDUCATION 


[ Vol. 8, No. 2 


question. For each item upon which the sty. 
dent marked “always” for the adjusted re. 
sponse or “rarely” for the non-adjusted one 
he received full credit (2 points). For each 
item marked “sometimes” the candidate re. 
ceived half credit. This test yields two scores 
—one for personal adjustment and one {or 
social adjustment. These may be combined 
into a single personal-social adjustment score. 

Comprehensive Achievement Test. — The 
Comprehensive Achievement Test, prepared 
by Dr. William A. McCall and Dr. John P. 
Herring, “aims to measure, by sampling, 
everything important which a child ought ty 
learn and which he can tell in a brief paper 
and pencil test.”(7) The nature of the test 
can be conveyed in some measure by examin. 
ing the titles of the brief subtests. These are: 


A. Health and play. 

. Reading. 

Finding information. 

. Speaking, writing and spelling. 

. Arithmetic. 

Arts and crafts. 

gape the world in which you 
ive. 

. Buying and using things. 

Being a sensible and useful citizen. 

Watching the progress of the world. 

. Choosing the best experiences. 

. Talking things over, handling disagree- 
ments, and getting things done. 

. Foreseeing consequences. 

. Understanding people and things 
(Camouflaged; really measures preju- 
dices). 


Ze 


O. Remembering t hin gs (Camouflaged; 
really measures truthfulness during the 
test). 

P. Keeping your temper. 

Q. Manners. 

R. Modesty (Camouflaged; really measures 


inferiority feelings). 
S. Enjoying life. 


The different subtests are made up of irom 
three to eight individual items. 

Modern School Achievement Test.—The 
Modern School Achievement Test is a well- 
known test of achievement in the traditional 
school skills.(8) The six subtests measure 
respectively reading comprehension, 
speed, arithmetic computation, arithmetic 
reasoning, spelling and language usage. 


December, 1939] 


McCall Intelligence Test—The McCall 
Intelligence Test is a multiple-response intel- 
ligence test of the multi-mental type.(9) The 
test was designed to be as free as possible 
from school achievement. All the words in the 
test were taken from the easiest thousand 
words in Gates’ Reading Vocabulary for Pri- 
mary Grades. It seems, therefore, that in 
grades four and above the test is not a test 
of ability to read words. The essential ele- 
ment of the test would seem to be the appre- 
hension of the varying relationships between 
the words or numbers included in a single 
item. 

School Practices Questionnaire —This in- 
strument,(10) prepared by Drs. McCall, Her- 
ring and Loftus, in cooperation with The New 
York Principals Association, endeavors to 
determine the extent to which activity prac- 
tices are actually in operation within a given 
classroom. Authoritative statements of the 
philosophy of the Activity Program were an- 
alysed, yielding a list of basic principles, 
characteristics or criteria. Concrete  evi- 
dences of each criterion, readily and objec- 
tively observable by a pupil, were incorpo- 
rated into a questionnaire form. From this, 
it was possible to derive a single score which 
was deemed a measure of the extent to which 
the particular pupil in question had partici- 
pated in those aspects of democratic activity 
which the philosophy of the activity program 
strives to foster. 


PART III 
MEASUREMENTS OF RELIABILITY 


Inasmuch as some of the tests and methods 
are relatively new and untried, it is particu- 
larly important to make generally available 
any evidence which we have concerning their 
reliability. For the purpose of the present in- 
vestigation, where interest centers not upon 
the individual but upon rather large groups, 
tests of fairly low reliability can be tolerated, 
as long as the errors are random ones and do 
not bias the evidence with regard to the com- 
parison of schools. However, the reliability 
should be investigated both for its immediate 
practical and its broader theoretical impor- 
tance. Reliability will be studied most exten- 
‘ively for those measures which have been 
thoroughly explored in previous re- 
Search. 


EVALUATION OF THE ACTIVITY PROGRAM 


173 


CopED OBSERVATIONS 


Unreliability may appear in the coded 
observation scores from any of the following 
sources: 

(1) The limited sampling of behavior. 
Obviously a limited number of obser- 
vations of a classroom give only a par- 
tial sample of all the events in that 
classroom, and another sample will 
give frequencies of these behaviors 
which differ somewhat from those in 
the first limited sample. 


Incomplete observation. It seems self- 
evident that the events which any 
observer sees and records will be only 
part of the totality of events occurring 
in that classroom. Unreliability may 
arise from the omissions which are in- 
evitable when a person tries to observe 
a complex situation. 


Even when all of the thirty to fifty children 
in a class are intent upon one project there 
is likely to be a good deal of by-play by indi- 
viduals or small groups, much of which 
properly belongs in one or the other of the 
categories of behavior included in this study, 
but which an observer would not be able to 
see and much less find time to record. Re- 
sults in a study by one of the authors(1r) 
indicate that an observer is likely to miss 
items when trying to do no more than to 
make a coded record of the interchange of 
teacher and pupil questions, comments and 
answers during a more or less formal recita- 
tion, leaving out of consideration any attempt 
to record what individual pupils, who are not 
reciting at the moment, may be doing. Accu- 
rate observation and recording of what hap- 
pens becomes even more difficult when the 
pupils are working in small clusters or groups, 
or on individual projects, in various parts of 
the classroom. It is impossible for a single 
observer to see and record all that goes on in 
such a situation. Moreover, the amount of 
agreement between the records of independ- 
ent observers may quite fail to give an ade- 
quate account of observer reliability. Both 
observers may similarly take note of the more 
striking events, each may influence the other 
by the direction of his gaze, and thus the two 
workers may show a considerably higher 
degree of agreement than would appear if 
the records of each were compared with a 
complete record of all that took place within 


174 


and outside the line of vision (including 
many happenings that may take place behind 
the observer’s back). 


(3) Varying interpretation. It is altogether 
possible that different observers, or 
even the same observer at different 
times, may vary in the interpretation 
which they will give to the same ob- 
served item of behavior. Despite con- 
ferences and discussions, there may 
be constant differences in the records 
which different observers would turn 
in for the same series of events. 


Two procedures were used to measure the 
reliability of the coded observations. To ob- 
tain an indication of the reliability of the 
observers, comparisons were made between 
the records obtained when two workers simul- 
taneously, but independently, observed in the 
same classroom. Secondly, the records from a 
series of observations of the same classes 
were divided into two equal parts, and split- 
half reliability coefficients were obtained be- 
tween the two portions of the data for each 
of the categories being studied. These coeffi- 
cients give an indication of how consistent 
the performance of a class is from day to day, 
relative to other classes, as seen by a par- 
ticular observer. ; 

Reliability of the Observers ——During one 
school day in the Spring semester of 1938, 
each observer was accompanied on his or her 
rounds by a check observer. The two simul- 
taneously observed and recorded in each of 
the four activity and four control classes be- 
longing to the regular observer. Some of the 
regular observers were checked by an extra 
trained worker who was available at that 
time; in other. cases, pairs of regular observ- 
ers would go on one day to the classes of one 
of the pair, and at another time to the classes 
of the other. 

The coded records thus obtained were 
subsequently compared, item by item (not, 
as in an earlier preliminary project, accord- 
ing to overlap of total items, regardless of 
the identity of each), to find the extent of 
agreement and disagreement. To qualify as 
an instance of agreement, it was necessary 
that the behavior in question was similarly 
recorded by both observers, attributed to the 
same child, and noted as occurring at a given 
time (as judged by the sequence of happen- 
ings noted in the record). 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


The tables below show the results obtained 
from these measurements. Separate results 
are shown for each of the regular observers 
in each behavior category, and a final table 
summarizes the separate tables. 


Agreement between observers is shown in 
terms of percentages. The percentage oj 
agreement represents the ratio of (a) the sum 
of items in the record of each observer that 
agree with items in the record of the other 
observer, to (b) this total plus items that are 
in disagreement. This method of comparative 
agreement has been used extensively in 
studies that have employed the method of 
direct observation. 


For example, Observer A’s record includes 
16 items; 12 of these agree with items in B's 
record, while 4 disagree (e.g., in one case A 
has attributed a given item of behavior to 
pupil X, but B has attributed it to pupil Y: 
in another instance an item credited to Y by 
A has been credited to X by B; in two in- 
stances, A has noted items of behavior which 
do not appear in B’s record at all). Observer 
B’s record includes 14 items; 12 of these 
agree with A’s and 2 disagree, as noted. The 
percentage of agreement, according to the 

I2 +12 


procedure described above, is ee or 80 


per cent. This method of computation has 
been described by Arrington.(12) 

Tables III and IV which follow show 
(1) the total number of items in each obsery- 
er’s record that corresponded to items in the 
others; (2) this total plus items on which the 
two records disagreed; and (3) the per cent 
of agreement. It will be noted, in the case of 
the first entry in the table that the two ob- 
servers agree on 59 items. The second hor- 
izontal entry, 73, indicates that the two ob- 
servers disagreed on 14 items (73 — 59). 
The per cent of agreement, computed accoré- 
ing to the method described above, § 


59 + 59 


59+ 59+ 14 

A simpler method of computing agreement 
would consist in dividing the number repre- 
senting the total pairs of items in agreement 
by the sum of such pairs plus disagreements 
(59 +73 in the first entry). Obviously this 
method gives a lower quotient. The formula 
which gives the more favorable picture © 
used here by virtue of past usage, but, ™ 
reproducing the actual tallies, the table als 


or 89 per cent. 


December, 1939] EVALUATION OF THE ACTIVITY PROGRAM 175 


TABLE III 


AGREEMENT ON CODED OBSERVATIONS. AGREEMENT BETWEEN REGULAR OBSERVER AND CHECK 
OBSERVER DURING PERIODS OF JOINT OBSERVATION IN Four ACTIVITY AND FouR CONTROL 
CLASSES. THE REGULAR OBSERVERS ARE IDENTIFIED BY CAPITAL LETTERS. COLUMN I ReEpP- 
RESENTS NUMBER OF ITEMS SIMILARLY ENTERED IN EACH OF THE TWO OBSERVERS’ RECORDS. 
CoLUMN II REPRESENTS SUM OF THIS VALUE PLUS DISAGREEMENTS. FoR METHOD OF 
COMPUTING PERCENTAGE OF AGREEMENT, SEE ACCOMPANYING TEXT 


Control Activity Cont. and Act. 
Observer I II % Agr. I Il % Agr. I II % Agr. 
A 59 73 89 59 75 88 118 148 89 
B 241 255 97 16 20 89 257 275 97 
Cc 97 142 81 36 51 83 133 193 83 
D 135 147 96 12 13 96 147 160 96 
E 86 118 84 14 21 80 100 139 84 
F 7 93 13 16 90 24 91 
G 103 109 97 34 41 91 137 150 95 
H 47 58 90 32 49 79 79 107 85 
Cum. Tot. 775 910 92 216 286 86 991 1196 91 
Med. Agr. 91.5 88.5 90 
Cooperative Activities 
A 35 45 88 9 20 62 44 65 81 
B 9 17 69 8 13 76 17 30 72 
Cc 4 7 73 7 14 67 11 21 69 
D 26 29 95 12 18 80 38 47 89 
E 10 19 69 10 15 80 20 34 74 
F 27 36 86 27 34 89 54 70 87 
G 27 29 96 48 77 17 75 106 83 
H 20 23 93 22 38 80 42 56 86 
Cum. Tot. 158 205 87 143 224 78 301 429 82 
Med. Agr. 87 78.5 82 
Self-Initiated Activities 
A 8 13 76 44 59 85 52 72 84 
B 5 q 83 17 23 85 22 30 85 
C 6 11 71 27 41 79 33 52 78 
D 2 2 100 62 71 93 64 73 93 
E 4 7 73 9 16 72 13 23 72 
F 4 5 89 23 28 90 27 33 90 
G 4 4 100 1 2 67 5 6 91 
H 52 60 93 19 23 90 71 83 92 
Cum. Tot. 85 109 88 202 263 87 287 372 87 
Med. Agr. 86 85 87.5 
Critical Activities 
A 53 63 91 72 105 81 125 168 85 
B 9 15 75 15 19 88 24 34 83 
C 7 17 58 11 34 49 18 51 52 
D 3 6 67 9 12 86 12 18 80 
E 5 5 91 15 20 86 20 26 87 
F 20 23 93 39 47 91 59 70 92 
G 12 16 86 5 9 71 17 25 81 
H 53 81 79 29 88 76 110 82 
Cum. Tot. 162 227 83 189 275 81 351 502 82 
Med. Agr. 82.5 86.5 82.5 
Leadership Activities 
A 0 0 ante 2 3 80 2 3 80 
B 0 0 aa 0 2 0 0 2 0 
Cc 2 2 100 0 0 2 2 2 100 
D 0 0 ae 3 3 100 3 3 100 
E 0 0 aes 1 1 100 1 1 100 
F 0 0 = 4 4 100 4 ‘4 100 
G 1 1 100 1 4 40 2 5 57 
Med. Agr. 100 100 100 


Recitational Activities 


176 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. > 


TABLE III (Continued) 


AGREEMENT ON CODED OBSERVATIONS. AGREEMENT BETWEEN REGULAR OBSERVER AND Cuecx 
OBSERVER DURING PERIODS OF JOINT OBSERVATION IN FOUR ACTIVITY AND FOUR ContRoy, 
CLASSES. THE REGULAR OBSERVERS ARE IDENTIFIED BY CAPITAL LETTERS. COLUMN I Rep. 
RESENTS NUMBER OF ITEMS SIMILARLY ENTERED IN EACH OF THE TWO OBSERVERS’ Recorps 
CoLUMN II REPRESENTS SUM OF THIS VALUE PLUS DISAGREEMENTS. FOR METHOD op 
COMPUTING PERCENTAGE OF AGREEMENT, SEE ACCOMPANYING TEXT 


Experimental Activities 


Control Activity Cont. and Act. 
Observer I II % Agr. I II % Agr. I II % Agr. 
A 44 49 95 7 8 93 51 57 94 
B 0 0 me 66 67 99 66 67 99 
Cc 0 0 nae 32 56 73 32 56 73 
D 41 41 100 65 65 100 106 106 100 
E 0 0 aan 65 65 100 65 65 100 
F 32 35 57 65 93 89 100 94 
G 0 0 Ges 54 83 79 54 83 79 
H 7 9 88 36 36 100 43 45 98 
Cum. Tot. 124 134 96 382 445 92 506 579 93 
Med. Agr. 95 96 96 
(Negative) Work-Spirit Activities 
A 1 2 67 4 8 67 5 10 67 
B 0 5 0 1 2 67 1 7 25 
Cc 6 10 75 0 6 0 6 16 55 
D 0 0 he 0 0 ren 0 0 = 
E 0 0 eam 1 1 100 1 1 100 
F 7 7 100 0 0 cet 7 7 100 
G 0 0 siti 0 0 a. 0 0 is 
H 0 0 a 0 6 0 0 6 0 
Cum. Tot. 14 24 74 6 23 41 20 47 60 
Med. Agr. 71 67 61 
TABLE IV 


MEDIAN AND RANGE OF PERCENTAGE OF. AGREEMENT BETWEEN EACH OBSERVER AND His or Hee 
CHECK OBSERVER, BASED UPON ALL CATEGORIES COMBINED 


Control 
Observers nge M 
67-95 88.5 
0-97 75 
58-100 74 
67-100 96 
86-100 93 
86-100 97 


affords a means of computing the less favor- 
able percentage. 

Table IV shows the range and median 
agreement between each observer and his or 
her check observer in all categories combined. 

Table III shows large fluctuations in the 
amount of agreement exhibited by different 
pairs of observers in the different categories 
of behavior. Among other things, it appears 
that sharp fluctuations, from zero to 100 per 
cent agreement, occur in connection with cer- 
tain types of behavior that occur relatively 
infrequently. In keeping with the fact that 
certain activities (e.g., recitational) occur rel- 


Activity Control and Activity 
Range M Range edian 
62-93 81 67-94 84 

0-99 85 0-99 83 

0-83 70 52-100 7 
80-100 94.5 80-100 94.5 
72-100 86 72-100 87 
89-100 90.5 87-100 91 
40-91 74 57-95 82 

0-100 88 0-100 86 


atively much more frequently than certain 
other activities (such as leadership, negative 
work-spirit) the total number of items of be 
havior on which the computations of percent- 
age of agreement are based varies decidedly 
as between different categories of activity, 
and as between different pairs of observers. 
It will be noted that in the case of most of 
the categories of behavior there are severd 
instances in which a given pair of observers 
have noted only a small number of items 0! 
the behavior in question. In many compat 
sons, where only a few instances of a certall 
type of behavior are involved, it would a> 


December, 1939] 


pear that the sampling is too small to provide 
an adequate measure of observer reliability. 

From Table IV, which summarizes the 
values obtained for each regular observer 
when all categories are combined, it appears 
that there is a good deal of variation between 
observers (ranging from a median agreement 
of only 73 per cent in the case of one observer 
to an agreement of 94.5 per cent at the other 
extreme). Just how high the percentage of 
agreement between two observers should be 
to constitute a sufficiently high degree of 
observer reliability for the purposes of an in- 
vestigation such as this cannot be determined 
from the data of this study. In view, how- 
ever, of the fact that the formula used in 
computing the percentage of agreement is 
one calculated to yield the most favorable 
outcome, it might reasonably be maintained 
that observer agreement on a_ substantial 
sampling to the extent of at least about 85 
per cent would be required if the data were 
to be used for the study of individual chil- 
dren. However, since the present data are 
used primarily for gross comparisons between 
groups, rather than individuals, a lower de- 
gree of observer reliability may be tolerated. 

The fact that the regular observers showed 
varying degrees of agreement with the check 
observer introduces an uncontrolled factor 
into the results that are shown in Part IV, 
which reports comparisons between activity 
and control schools over a period of several 
semesters. Where fluctuations occur, it can- 
not be determined to what extent these are 


EVALUATION OF THE ACTIVITY PROGRAM 


177 


due solely to actual changes in the relative 
frequency of various types of behavior from 
one term to the next, and to what extent they 
may be influenced by variables that involve 
only the observers, including changes in 
observer personnel. 


Change in Observer Personnel——Table V 
indicates the changes in observer personnel 
which took place during the entire observa- 
tional program. An observer normally took 
records during approximately four months of 
the usual five months of the school term. 
Cases in which less than four months are in- 
dicated represent either the illness or sudden 
leaving of an observer. No record of any ob- 
servers was used in the fina! results unless 
that observer was present for at least half of 
the term. The number of observers during 
any particular term was a function of the 
number of schools included in the study and 
the number of classes observed. From June 
1937 to June 1939, 8, 9, 10, 9, 8 pairs of 
schools were observed respectively, employing 
8, 9, 13, 9, 8 observers. The table does not 
indicate the shift of observers from school to 
school. 


Table V indicates that five observers A, 
B, C, D and E were present throughout the 
entire program. One observer (Observer 1) 
was present for four of the five terms: 
Observer J was present for three terms; 
Observers F, M, N and Q were present for 
two terms and Observers G, H, P and R were 
present for one term. 


TABLE V 


RosTeR OF OBSERVERS. THE NUMBERS IN THE COLUMNS INDICATE THE APPROXIMATE LENGTH 
or TIME IN MONTHS EACH OBSERVER WAS PRESENT IN ANY PAIR OF SCHOOLS 


Spring- 
Spring—1937 Winter, 1937 Spring—1938 Winter, 1939 Spring, 1939 
4 4 


Observer 

4 4 4 4 

4 4 4 4 

4 4 4 4 
4 4 4 3% 

4 4 4 4 


178 


Relative Difficulty of Record-Taking in the 
Two Types of Classes —There is one rather 
important practical consideration that should 
be borne in mind in connection with the 
tables that follow. During periods when chil- 
dren work individually or in small clusters, 
with a good deal of freedom to confer and to 
move about, it is physically impossible for 
an observer to keep his eye upon everything 
that goes on. As one would expect, such 
periods of free activity occur more often in 
activity than control schools. The one type 
of activity that is especially likely to be 
diminished at such periods is formal recita- 
tion. The items of behavior which the ob- 
server misses by reason of inability to watch 
everything at once will consist mainly of ac- 
tivities other than recitational activities. In 
other words, it is possible that a complete 
account of all that happened would substan- 
tially increase the tallies in the remaining 
categories. Just how these tallies might be 
distributed among the remaining categories 
cannot, of course, be surmised. In any event, 
in view of the trend of the findings shown in 
Part IV, there is reason to believe that this 
factor of unreliability tends to minimize the 
differences between activity and _ control 
classes in the very types of behavior in which 
the findings show the activity classes to be 
in the lead. In other words, it is here-argued 
that complete observer reliability would not 
only confirm, but perhaps substantially aug- 
ment, the differences in favor of the activity 
schools. It must be pointed out, however, that 
this line of argument would be more convinc- 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


ing if observer reliabilities in the nop. 
recitational categories were consistently 
higher in non-activity than in activity classes, 
Actually, the percentages of agreement be- 
tween observers are practically the same jp 
the two types of classes. 

Reliability of the Sampling.—Tables VI 
and VII give an indication of the extent to 
which ten half-hour periods of coded observa- 
tion afford a reliable sampling of the behavior 
that is being studied. In Table VI are pre- 
sented reliability coefficients obtained by cor- 
relating the first five periods of a series of ten 
periods of observation with the second five. 
These observations were carried out during 
the semester ending in January 1938. The 
table represents data on 51 classes, 25 from 
control and 26 from activity schools. 

It is apparent from these results that there 
are wide variations in the reliability of a 
limited sample of observations of this kind. 
On the basis of 10 periods of observation it 
seems that one gets a fairly reliable picture 
of cooperative and work-spirit activities in 
either control or activity classes and in the 
data from these classes combined. Critical 
activities and leadership activities seem to be 
adequately determined in the activity classes 
and in the combined data, but these categories 
do not appear consistently in the control 
classes. \The data on experimental, recita- 
tional and self-initiated activities are found 
to be so unreliable in either the separate or 
combined data that their use, even for group 
comparisons, seems somewhat hazardous. 


TABLE VI 


Spuit-HALF RELIABILITY FOR 10 PeRiops OF CODED OBSERVATIONS. CORRELATIONS BETWEEN 1sT 
AND 2ND 6 PERIODS OF OBSERVATION IN EACH OF 51 CLASSES IN 14 ScHOOLS (7 ACTIVITY 
AND 7 CONTROL); SEPARATE CORRELATIONS ARE ALSO SHOWN FOR THE ACTIVITY AND THE 
CONTROL SCHOOLS. THESE COMPUTATIONS ARE BASED ON RESULTS OBTAINED IN 1958* 


51 classes in 14 
schools: 7 activity 
and 7 control 


Corrected 

r P.E. r 
Cooperative activities _.._ .78 .04 88 
Critical activities _._____ .70 .05 82 
Experimental activities __ .28 .09 44 
Leadership activities _._.._ .82 .03 -90 
Recitational activities _.. .51 .07 68 
Self-Initiated activities _. .43 .08 .60 
Work-Spirit activities _.__  .88 .02 .94 


26 classes in 7 


25 classes in 7 
activity schools 


control schools 


Corrected Corrected 

r P.E. r r P.E. r 
76 .06 .86 .77 .06 87 
29 .45 .86 04 92 
41 ll .58 18 13 30 
01 14 88 .03 94 
46 ll .63 56 .09 12 
54 10 -70 30 12 AT 
91 .02 .95 87 .08 


* The Pearson product-moment formula has been used in es all correlations in ° 
and other tables, unless otherwise stated. The corrected r’s 
Spearman-Brown formula. 


ve been obtained by 


December, 1939] 


The above coefficients give an indication 
of the adequacy of ten periods of observation 
as a sampling of the behavior under observa- 
tion. However, since the scores that were cor- 
related in the case of each class represented 
totals for the class as a whole (rather than 
scores of individual pupils) as derived from 
the records of the same observer at different 
times, the coefficients inevitably are influ- 
enced by the factor of observer reliability. In 
a sense they measure the extent to which 
there is a consistency in the frequency of 
items of a given category recorded by a given 
observer in the same class from time to time. 
To the extent that the original observations 
are reliable, the above coefficients may be 
regarded as a measure of the reliability of the 
sampling. However, to the extent that the 
original observations are unreliable, to that 
extent these coefficients show degrees of con- 
sistency in what the observer happened to 
see and record rather than in the frequencies 
of the various forms of behavior as they 
actually occurred. 

As already indicated, the data presented 
above are for classes as units. Most of the 
results upon coded observations which are 
used in this investigation are based upon these 
class units. Since, however, two of the tables 
that follow are based upon scores of partic- 
ular individuals, it may be of interest to in- 
quire into the reliability of the scores of 
single individuals. Split-half correlations were 
computed for the records of 1833 pupils in 
the classes discussed above, considering the 
record of an individual as the unit. These 
results are presented in Table VII. 


TABLE VII 


RELIABILITY OF CODED OBSERVATION SCORES OF 
SINGLE INDIVIDUALS. CORRELATIONS BE- 
TWEEN 1ST AND 2ND 5 PERIODS OF OBSERVA- 
TION FOR EACH OF 1833 INDIVIDUAL PUPILS 
IN 51 CLASSES 


Cooperative activities __. 
Critical activities 


The median coefficient (after correction by 
the Spearman—Brown formula) is :471. This 
table emphasizes, as is pointed out elsewhere 
in this report, that the data obtained by 


EVALUATION OF THE ACTIVITY PROGRAM 


179 


means of the coded observations have little 
reliability as far as individual pupils are 
concerned. 


ANECDOTAL OBSERVATIONS 


Unreliability in the scores on the anecdotal 
records may arise from the following sources, 
singly or in combination: (1) fluctuation in 
the rating given to a particular set of rec- 
ords; (2) fluctuation in what is observed and 
recorded from a sequence of classroom occur- 
rences; (3) fluctuation from day to day in 
the occurrences within a given classroom. An 
index of the consistency of rating the same 
set of records may be obtained by correlating 
the ratings given by separate raters to the 
same set of records. Data on this point are 
presented in Table VIII. Some measure of 
the combined effect of rater unreliability and 
observer unreliability can be gotten by cor- 
relating the ratings given to records made 
simultaneously by two independent observers. 
Evidence on this point is presented in Table 
IX. The effect of sampling unreliability, to- 
gether with rater unreliability and some frac- 
tion of observer unreliability, could be studied 
by correlating the ratings given to different 
samples of behavior in the same classes. We 
have no data to present on this point at this 
time. 

Table VIII presents data on the reliability 
of rating anecdotal records. In practice, each 
individual’s record was rated independently 
by four raters. The ratings thus given to in- 
dividual children were averaged to yield the 
mean for the class. In Table VIII, the class 
average of the ratings by two raters is cor- 
related with the average of the ratings by the 
other two. The corrected correlations 
(Spearman—Brown) give the estimated cor- 


TaBLe VIII 


RELIABILITY OF RATINGS OF ANECDOTAL REC- 
ORDS. CORRELATIONS BETWEEN CLASS AVER- 
AGES OF RATINGS ASSIGNED TO THE SAME 
Recorps BY Two SETs OF Two INDEPENDENT 
RATERS. BASED ON RECORDS OBTAINED IN 38 
ACTIVITY AND 38 CONTROL CLASSES IN JUNE 
1938 


Category 
Cooperative activities -936 
Critical activities .884 
Experimental activities ......_ 881 
Leadership activities -960 
Recitational activities _..._... .945 
Self-Initiated activities _...... .944 
Negative Work-Spirit activities .870 


Cor- 
r P.E. rected r 
368 .014  .537 
Experimental activities... .246 .015 .394 
Leadership activities _.. .188 .015 .317 Cor- 
Recitational activities _. .297 .014 .458 rected r 
Self-Initiated activities. .308 .014 .471 967 
Work-Spirit activities .875 .014 .545 "938 
.937 
-980 
.972 
971 
-930 


180 


relation of the average of four raters with the 
average of another four raters. These cor- 
rected correlations are very substantial, rang- 
ing from .930 to .g80 with a median at .967. 
It appears that the ratings have satisfactory 
reliability as far as gross comparisons between 
classes are concerned. 

In computing these correlations, there 
arose the problem of handling cases in which 
a child had no items recorded in a particular 
category, and in which he would consequently 
automatically receive a rating of zero. Since 
the inclusion of these cases raised the cor- 
relations in a way which seemed spurious, 
they were omitted in the calculations upon 
which Table VIII is based. If these cases 
were to be included, the median corrected 
correlation would be raised from .967 to .985. 
The range would then extend from .965 to 
.996. 

The correlations in Table [IX were obtained 
by sending a check observer into the class- 
room with the regular observer. These two 
observers worked simultaneously but inde- 
pendently, each making notes of observed 
items of classroom behavior. The items in a 
given category were then grouped together, 
so that the records of a whole class could be 
treated as a unit, and, finally, these- records 
were rated by a team of four independent 
raters. The size of the correlations in Table 
IX is a function of (1) the adequacy of rat- 
ing, as studied in Table VIII, and (2) the 
uniformity of observing and reporting by 
observers. 


TABLE IX 


CORRELATION BETWEEN AVERAGE CLASS RAT- 
INGS OF ANECDOTAL RECORDS OF REGULAR 
AND CHECK OBSERVERS. BASED ON SIMULTA- 
NEOUS BUT INDEPENDENT OBSERVATIONS OF 
THE SAME CLASSES. DATA GATHERED IN 1937 
AND IN 1938 

1937 
Rank- 1938 
differ- Product-moment r 


Category encer Activity Control 
Cooperative activities .38 87 88 
Critical activities ___ .56 .97 -90 
Experimental activ- 

.68 .88 
Leadership activities_ .89 -75 1.00 
Recitational activities .40 81 80 
Self-Initiated activ- 

.69 89 
Negative Work-Spirit 

activities _....... .92 67 31 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


The procedure described above was carried 
out in 1937 and again in 1938. Correlations 
for the 1937 data are rank-difference correla. 
tions, while correlations for the 1938 data are 
product-moment correlations. Correlations for 
activity and control classes are presented 
separately for the 1938 data. 

There seem to be rather marked discrep. 
ancies between the results for 1937 and those 
for 1938. In the 1938 data, the correlations 
between ratings of observations recorded by 
two independent observers are all .75 or 
higher, with the exception of the ratings of 
negative work-spirit activities, and the median 
of all the correlations is .87. The 1937 cor- 
relations range from .38 to .g2 with a median 
of .68. The highest correlation is for the cate- 
gory of negative work-spirit, which yielded 
the lowest correlation in 1938. The reason for 
these discrepancies is not at all clear. It may 
lie in part in differences in observer (and es- 
pecially check observer) personnel, in part in 
difference in the exact statistic used, in part 
in the vagaries of sampling. If we accept the 
1938 correlations, we would say that different 
observers describe the events in a particular 
classroom group in such a way as to produce 
quite closely comparable ratings, except for 
the category of negative work-spirit. If we 
follow the 1937 results, we must say that the 
consistency of description shows marked vari- 
ation from one category to another, being low 
for some and quite high for others, and that 
the consistency is highest for negative work- 
spirit and leadership activities. 

Although the coefficients in the tables 
above show a wide range, it would appear 
that the underlying data are sufficiently reli- 
able to justify certain gross comparisons be- 
tween the two groups. The data cannot, how- 
ever, justifiably be used for refined treatment 
and comparisons. The reliability coefficients 
above are based, as noted, on the averages 
of scores of groups of children, rather than 
upon a correlation of individual scores. More- 
over, the anecdotal records on which the rat- 
ings are based represent records of whatever 
projects or enterprises the classes might be 
undertaking when the observers happened to 
appear, rather than observations of all classes 
within each group under approximately sit 
ilar conditions. Further, as noted on 4 
earlier page, the qualitative ratings were 
fluenced not only by the “quality” of each of 
the recorded items but also by the quantity 
or frequency of items in a given category. 


December, 1939] 


PAPER AND PENCIL TESTS 


The test materials which were prepared by 
the staff of this project include tests of 
knowledge of current affairs, of working skills 
in the social studies, of ability to explain 
facts in social studies, of ability to apply gen- 
eralizations to social studies events; also, a 
questionnaire on social beliefs and attitudes, 
and an inventory of personal and social ad- 
justment. The reliability of these tests was 
determined by the split-half technique. Co- 
efficients obtained by correlating one half of 
the items on a test with the other half were 
corrected by the Spearman—Brown formula. 
Coefficients of correlation based upon (1) a 
random sample of roo cases in grades 4, 5 
and 6 tested in the spring of 1937, and 
(2) 300 cases in grades 4, 5 and 6 tested in 
the spring of 1938 are presented in Table X. 

The reliabilities of these tests seem to be 
fairly satisfactory. It must be remembered 
that these coefficients are based upon sam- 
plings from three grades, and it would be ex- 
pected that the reliability for a single grade 
group would be lower. Actually, however, 
correlations computed for single grade groups 
corresponded closely to the results shown in 
Table X. By reason of this conformity, and 
to conserve space, the correlations for the 
separate grades are not reproduced here. 


For the McCall Intelligence Test and for 
the Comprehensive Achievement Test pre- 
pared by Drs. McCall and Herring, data on 
reliability are presented in the test manual. 
The index of reliability (correlation between 
a fallible test and a hypothetical “true” 
measure of the ability) in an age group is 
reported to be .97 for the Intelligence Test 
and .96 for the total score on the Comprehen- 
sive Achievement Test. Scores on the sepa- 
rate subtests of the Comprehensive Achieve- 
ment Test are obviously very much less reli- 


EVALUATION OF THE ACTIVITY PROGRAM 


181 


able, but they are reported in the test manual 
to be “reliable for classes, grades, schools and 
school systems.” 

Concerning the School Practices Question- 
naire, which endeavors to measure the extent 
to which a classroom is actually achieving 
the aims of an activity program as expressed 
in democratic living, the probable error is 
reported in an earlier article to be about 2% 
points in score at the level of grades four, five 
and six. 


PART IV 
RESULTS 


ScHOOL PRACTICES QUESTIONNAIRE 


The school practices questionnaire was 
administered to a large sample of activity 
and control classes at intervals of approxi- 
mately eight months from June 1937 to 
November 1938. This questionnaire was de- 
signed to determine the degree to which the 
avowed purposes of the activity program were 
actually being achieved, as reflected in the 
experience of the children. | 

“Tables XI and XII give comparisons be- 
tween activity and control classes. A differ- 
ence between activity and control classes is 
apparent for each grade and at each testing. 
In other words, there is evidence of a differ- 
ence in fact as well as in name between these 
two groups. On the average, the procedures 
in the officially “activity” classes have been 
altered so that the children report experiences 
more nearly conforming to the ideals set forth 
in the philosophy of the activity movement. 

Comparisons between the different grade 
groups suggest that the differential between 
activity and control classes increases some- 
what as one goes from the fourth to the sixth 
grade. This seems to be in part a function of 
closer approximation to the activity ideal by 


TABLE X 


RELIABILITY OF WRIGHTSTONE TesTs. SPLIT-HALF CORRELATIONS FOR A RANDOM SAMPLE OF 
CHILDREN IN GRADES 4, 5 AND 6 


100 pupils tested in 1937 300 pupils tested in 1938 
Corrected w Corrected 
r 


Applying 

Current affairs 

What do you 
dual adjustment 
adjustment 


r 
- 92 
-81 


84 
90 
.72 
. 88 
92 


| | 
Working skills in the social studies...._._______ 85 90 95 
Explaining facts j “68 68 “81 
studies___ . 73 . 87 
. 81 . 84 .91 
. 56 . 68 . 81 
.79 .74 . 85 
. 85 . 87 


182 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


TABLE XI 


Scores oF ACTIVITY AND CONTROL CLASSES DURING THREE SUCCESSIVE TERMS ON THE Scxoo, 
PRACTICES QUESTIONNAIRE. THE VALUES WERE OBTAINED BY AVERAGING 
THE AVERAGE Scores FOR EACH CLASS 


Activity 
No. of Average of av- S. E. of Critical 
Date test was given classes erage scores Difference diff. ratio 
March 1988. ........- 112 50.45 
SS 88 49.60 . 85 1.08 .79 
112 55. 80 
112 50.45 5.35 1.18 4.53 
$a 112 55. 80 
pe 88 49. 60 6. 20 1.31 4.74 
Control 
March 1938. ........- 134 43.36 
lll* 42.32 1.04 1.08 96 
129 44.16 
Mar. 1938_.......__-- 134 43.36 80 1.14 70 
129 44.16 
111 42.32 1.84 1.14 1.61 


the activity groups, and in part a function of 
a smaller amount of activity in the control 
groups in the higher grades. 

Table XII gives some suggestion as to the 
progress of the activity movement from term 
to term in these schools. According to this 
table, the activity classes have made signifi- 
cant progress between June 1937 and Novem- 
ber 1938 toward the activity ideal. The con- 
trol groups show an insignificant change in 
the same direction. We find that the differ- 
ence between the two groups increased appre- 
ciably in the time between the first and third 
testing. 

The differences between the two groups do 
not appear large when average scores are 
compared. On a range of possible scores ex- 
tending from o to 105, the activity group in 


. the last testing has an average of 56, as com- 


pared with 44 for the control group. Hov- 
ever, we have no general norms in terms of 
which we can evaluate these averages. The 
all-or-none nature of the responses on the test 
may tend to mask subtle quantitative differ- 
ences in experience, so that large numerical 
differences on the test should not be expected 
between groups in which practices were 
clearly quite different. Since the test probes 
whether or not a given practice has occurred 
at one time or another during a period of 
time, rather than the extent to which this 
practice has characterized the work of the 
class day by day, the quantitative scores on 
the test perhaps fail fully to discriminate be- 
tween the control and activity classes. To 
measure the actual differences in practice 


TABLE XII 


Scores ON SCHOOL PRACTICES QUESTIONNAIRE AT 4TH, 5TH AND 6TH GRADES. THE VALUES 
NT THE AVERAGE OF THE SEPARATE AVERAGES OF THE 
SEPARATE CLASSES FOR EACH GRADE 


ve of ave’ scores Average average scores Average of average sco 
Critical Critical Critical 
Act. Con. ratio Act. Con. ratio Act. Con. ratio 
Total no. of classes. 88 111 112 134 112 129 
Grade 
| SAE 48.36 42.56 3.02 49.08 44.45 2.52 51.60 43.50 3.4 
«+ RES 50.25 43.32 3.50 49.16 43.12 4.14 57.87 45.92 6.0 
REISE 49. 41.33 4.51 68.00 42.70 5.75 58.35 42.49 6.78 
Combined 4th, 5th, : 9 
9.44 42.32 6.19 60.51 48.35 7.16 55.80 43.95 9%! 


December, 1939] 


| would require more detailed observation in 
the individual classrooms. We may say, in 
any event, that there is unquestionably a real 
difference between the two groups which we 
are studying—a difference which has appar- 
ently increased with the passage of time. 


Copep OBSERVATIONS 


The tables which follow show comparisons 
' between activity and control schools on the 
" basis of the quantitative results obtained by 
” means of the coded observations. The tables 
show averages and the critical ratio of the 
difference between each pair of averages. 


S The averages are based upon the mean scores 


> of the constituent classes in each group. Each 
" value represents the hypothetical frequency, 
| per child, of the behavior in question during 
) 100 periods of observation, as derived from 
' the original scores. The critical ratio, as is 
| usual, represents the value obtained when the 
difference between two averages is divided by 
| the standard error of the difference. 

' By reason of the large number of classes 
> included in the study, and variations in pop- 
| ulation from time to time as classes were 
' added or dropped, it is possible to show a 
© large number of comparisons. The first col- 
» umn of figures in Table XIII shows the re- 
sults obtained in the 64 classes in the eight 
) pairs of schools that were included in the 
| study when the investigation first began. 
| Other entries in the same table show compari- 
» sons between the same grades (but not the 
same classes) in the same schools during the 

succeeding four terms. 


To the extent that the underlying data are 
reliable, this table should not only provide a 
basis of comparison between activity and con- 
trol schools but should also give an indication 
of the extent to which various types of be- 
havior remain constant or undergo changes 
during the passage of time under the two 
differing school regimes. To measure such 
change it would, of course, be best to obtain 
data on the same children, under a similar 
continuing regime, over a period of time. Due 
to shifts in the school population, however, 
this was not feasible, although some com- 
parisons of this kind, based upon limited 
numbers, are presented at a later point. In 
the comparisons shown in Table XIII the 
‘actor of the past experience of pupils who 
tad transferred to the schools in question is 
not controlled. 


EVALUATION OF THE ACTIVITY PROGRAM 


183 


From Table XIII it appears that the activ- 
ity and control schools show notable differ- 
ences in some categories, little difference in 
others. The most outstanding difference, from 
the point of view of the relative magnitude 
of the averages and of the critical ratios, ap- 
pears in connection with Self-Initiated Activ- 
ities. Throughout the five semesters, the chil- 
dren in activity schools exhibit from about 
two to three times as much behavior in this 
category as the children in the control 
schools. 

Large and substantially reliable differences, 
in favor of the activity groups, appear in the 
frequency of Critical Activities. The activity 
groups consistently show about twice as much 
behavior in this category as do the controls. 
The activity groups also exhibit a higher 
frequency of Leadership Activities. The abso- 
lute frequencies of behavior in this category 
are relatively small (as one would expect by 
virtue of the fact that there is a limit to the 
number of children who can operate as lead- 
ers at a given time), but the differences be- 
tween the two groups of schools during three 
of the five terms are relatively very large. 


The two groups of schools also differ to a 
substantial degree in the frequency of Reci- 
tational Activities. Recitational Activities are 
decidedly less frequent in the activity schools 
during each term. 


The results with regard to Cooperative 
Activities do not consistently favor either 
group; in four of the five comparisons the 
activity classes are in the lead, but the dif- 
ferences are relatively small, and in no com- 
parison is the difference statistically reliable. 

An interesting result appears in connection 
with Negative Work-Spirit Activities. In four 
of the five comparisons, the activity classes 
exhibit a slightly larger number of activities 
in this category than do the children in the 
control schools. At best, however, the fre- 
quencies are small, and the differences are 
neither substantial or reliable. It should be 
pointed out, however, that the findings in 
connection with this category of behavior 
may be considerably more significant than 
would appear from the averages. Less note- 
worthy than the fact that the activity classes 
exceed the control classes in the frequency of 
behavior of the “negative work-spirit” sort 
is the fact that the difference is so smali. By 
virtue of the greater amount of freedom 
afforded by the activity program, the pupils 


s 
= 
os 


JOURNAL OF EXPERIMENTAL EDUCATION 


184 


99°T 
(26 00°L 
v8 
89°19 
92 
(Sb °T) 62 
(9T 9L 
$6 
(06 68 
(89 29 IT 
9L°LT 
FZ 


6861 DNIUdS 


) ) (88° ) 08 (06° ) v6 
(02 v6 (18 (82 °S) (63 £9°9 
GL°LL 89°29 90°69 09°99 
18 06 "68 00 
(89 (28 (02°F) Tg (62° ) 
(09 (68 $0 (6L° ) (611) 00 “OT 
GL 9122 9L ST 
(LL £9 '8 (18 *8) ($0 °3) (89 9L°9 
£9 “ST IL 26 99 “ZI 
(L3°T) ‘IT (6L° ) —) %6°IT ) 
FI 80 °TT 9L°6 89 OI 
HALNIM 861 8e-Le6l UALNIM ONIUdS 


NI NMOHS GUY SOILVY TVOILIYD SLI 


WAV 


diysiapeer] 


ANIV 


SONI AGALG SHL NI TION] SIOOHOS IOULNOD ALIALLOY dO S¥IVG LHDIG AHL NI HL9 
‘HLP GHL dO SANOOS dO V ‘SNOSIUVAWOD ‘SWAL] YOIAVHAG 40 


Fav 


= 
| 
2 
= 
— 


= 
: 
= 
~ 
o 
= 
S 
<x 
= 
Q 


December, 1939] 


as B88 
Sg ais 


98 


) 


(69 


“b—) 


28 


oot 


(60 


-o 


(WIT) 
(LT 


OF 
8h Ov 


NI NMOHS SOILVY 


drysiapea] 

SATPAIJOR 


PAIPYV 
Jo “oN 


‘NOILVDILSGAN] SHL dO SSYN0D SHL ONIUNG Sadvat) GNV SASSVID Woug SLINsay 
NOdQ) GSSVG SIOOHOS IOMLNOD GNV ALIAILOY NI NSYCTIHD 4O SAYOOS NASMLAG SNOSINVANOD AOIAVHSG 40 


AIX 


185 
o & & | 
aS 
© NAN 
NN 
OG 
25 L 
No 
& ot Wa © 
2 NS WHO sez 
| 
3 ~ ~ 
6S SS BS SE SS 
12 10 
On 
|| 
on | 
an 
re 
ps 
£ 
Sa =| o> 
3 
Z 


186 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


TABLE XV 


FREQUENCIES OF CODED BEHAVIOR ITEMS. COMPARISON BETWEEN EIGHT 1ST, 2ND AND 3RD Grape 
ACTIVITY CLASSES AND EIGHT CONTROL CLASSES, BASED ON CODED 
OBSERVATIONS, WINTER 1938-39 


Average scores (No. of ac- 
tivities per pupil per 100 
periods of observation) 


Activity Control 

Cooperative activities _ 14. 63 15. 62 
Critical activities. 9. 80 6.75 
Experimenta! activities 34.00 25. 50 
Leadership activities __ 1. 50 . 32 
Recitational activities - 49. 62 73.00 
Self-Initiated activities 11. 25 3.75 
Negative Work Spirit 

44 1.75 


in activity classes actually have considerably 
more opportunity than the pupils in control 
classes to commit acts which would fall in 
the “negative work-spirit” category if they 
had refractory inclinations. The quantitative 
findings set forth in Table XIII indicate that 
the activity pupils did not abuse this freedom 
to any significant degree. 

In Table XIII, results for the semesters 
following the Spring of 1937 represent only 
the 4th, sth and 6th grade classes in the 
schools which were included during the first 
half-year of the study. Table XIV shows re- 
sults obtained from all classes included in the 
study during five terms, including the 4th, 
sth and 6th grade classes just mentioned as 
well as other classes that were added. 

In the tabulation of the results of the 
coded observations, separate tables were 
made to show the results at each of the sepa- 
rate grade levels (4th to 6th inclusive) for 
each of five semesters from 1937 to the spring 
of 1939. These five tables are not reproduced 
here, but such trends as they showed are 
summarized below. Table XV gives compar- 
isons between 1st, 2nd and 3rd grade classes 
that were included in the study during the 
winter semester of 1938-39. 

As might be expected, due to the large 
number of factors involved, including differ- 
ences between observers and shifts in ob- 
server personnel, the tables referred to above 
show numerous fluctuations in average scores 
in the various categories at the various grade 
levels during the five school terms. However, 
over and above such fluctuations the general 
trend of the results appears to be quite con- 
stant. There is no consistent or outstanding 
indication that the activity program is hav- 
ing different effects at different grade levels, 


S. E. Critica] 
Difference diff. ratio 
—.99 6.45 —.15 
3.05 3.32 . 92 
8. 50 5. 59 1. 52 
1.18 . 60 1.97 
—23. 38 14. 46 —1. 62 
7. 50 3.87 1. 94 
—.31 . 15 —.41 


nor does it appear that there is any consistent 
change in one direction or the other, in any 
category or group of categories of behavior, 
with the passage of time. The data do sug- 
gest, incidentally, that in two categories of 
behavior (critical activities and experimental 
activities) the differences between activity 
and control classes are smaller in the case of 
the younger children in grades 1 to 3 than in 
the case of older children in grades 4 to 6. 
This trend is not conclusively established, but 
it is suggestive. The fact that critical activ- 
ities are relatively infrequent in the early 
grades, both in the activity and the non- 
activity groups, perhaps is associated with 
the fact that class discussions at this level 
are less likely than at later levels to take the 
form of an open forum or debate on a single 
issue or theme, and are more likely to take 
the form of separate, and often miscellaneous, 
reports or contributions by individual chil 
dren. The relatively high frequency of «- 
perimental activities in the first three grades 
in the control schools suggests that under 
conventional circumstances there is more it- 
centive and opportunity for this type of activ- 
ity in the early grades than in the later 
grades. 

In order to obtain more conclusive infor- 
mation as to possible changes that might come 
as children continue from year to year in one 
school regime or the other it would be valv- 
able to have results based upon repeated 0b 
servations of the same children. Due to the 
large turn-over in the school population, and 
the fact that the schools included in this study 
represent only a small (and widely scattere’) 
fraction of all the schools in Greater Nev 
York, it was not possible to make a study 0! 
this sort on a substantial scale. Data of # 


December, 1939] 


EVALUATION OF THE ACTIVITY PROGRAM 


187 


TABLE XVI 


FREQUENCIES OF CODED 


Semester Control 


ending 
ive activities June 1937 
Jan. 1938 
June 1938 


June 1937 
Jan. 1938 
June 1938 


June 1937 
Jan. 1938 
June 1938 


June 1937 
Jan. 1938 
June 1938 


June 1937 
Jan. 1938 
June 1938 


June 1937 
Jan. 1938 
June 1938 


June 1937 
Jan. 1938 
June 1938 


Critical activities. 


S22 BNE 


Negative Work- 
Spirit activities _ 


SNe 


follow-up nature, but rather limited in scope, 
are shown in Table XVI, which presents the 
averages of the scores obtained by a group 
of 59 children in control schools and a group 
of 151 activity children during three succes- 
sive terms beginning in the spring term of 
1937, and by groups of 145 control and 222 
activity children who were observed during 
0 successive terms beginning in the spring 
of 1938. 

From Table XVI it appears that there is 
a good deal of stability, from one term to the 
next, in the frequency with which given chil- 
dren display given types of behavior; 
although there are substantial changes in 
some scores there is little evidence of signifi- 
cant shift from one term to the next, and 
little indication of any tendency for the two 
groups to draw apart in their scores as expo- 
sure to the two educational environments 
was continued over a longer period of time. 
However, it is not possible to attach much 
weight to this table, since, as was pointed out 
earlier, the methods of observation used in 
the study, while adequate for certain com- 
parisons between large populations, did not 
afford reliable data on the behavior of indi- 
vidual children. 


51 


30 
- 46 


° 


BEHAVIOR ITEMS. AVERAGE SCORES OF CHILDREN EACH OF WHOM WAS 
OBSERVED DURING THREE CONSECUTIVE SEMESTERS 

Activi 
N=15 


Control 
N=145 


Semester 
ending 
June 1938 
Jan. 1939 
June i339 


June 1938 
Jan. 1939 
June 1939 


June 1938 
Jan. 1939 
June 1939 


June 1938 
Jan. 1939 
June 1939 


June 1938 
Jan. 1939 
June 1939 


June 1938 
Jan. 1939 
June 1939 


June 1938 
Jan. 1939 
June 1939 


48 
51 


ass 


PPP 
EN 


ANECDOTAL OBSERVATIONS 
The anecdotal observations yielded an 


average rating for each class for each of the 
behavior categories observed. The class score 
was the average of the scores of the students 
in the class. Although designed to give a 
measure of quality, as distinguished from 
frequency, these ratings are a function, in 
part, of the number of recorded items as well 
as the quality of any given item. All subse- 
quent computations were made with these 
class scores, using them as the items of data. 
Statistical procedure consisted of finding: 


(1) The mean of the ratings for the total 
group of 32 to 48 activity classes. 

(2) The standard error of this mean. 

(3) The mean of the ratings for the total 
group of 32 to 48 control classes. 

(4) The standard error of this mean. 

(5) The difference between the mean of 
the activity classes and the mean of 
the control classes. 

(6) The standard error of this difference. 

(7) The critical ratio. ; 

Anecdotal observations were made during 


five successive semesters, beginning in the 
Spring of 1937. Each of these semesters of 


1.03 1.47 1.48 
1.36 1.38 1.94 
1.36 1.87 2.44 
92 98 1. 
Ei 1.70 1.18 2. 
1.01 1.65 2. 
3 imental 
1.28 1.98 3. 
1.16 1.77 3. 
: 1. 62 2.12 3. 
Leadership activities 53 
q ‘ 
| Recitational ac- 
7 4 
6 4 
 Self-Initiated 
§ 1 


JOURNAL OF EXPERIMENTAL EDUCATION 


188 


Vol. 8, No.2 


(6g°—) 216° 

82° 88° 

°S) °9) 6L ‘I 
tr 
00°L SI‘L 
£0°¢ 82'S 

98° 

ae 98 
ze 

6861 3uudg RECT 


(09 


(LI°—) 68° 
61% 
92°F 
(6)'L—) 
(L9°L) 02° 
(88 
(88 16 
9L°% 
(188) 
66 
LLST 
8861 


(90° ) 69° 
09° 
(96 
(3 °L) 
(00 02° 
89 °9 
(63 £0 
£8 
OF 
OF 


SE6T-LEGT 


(88°) 


(38 


(681) 


(60 


SAN1VA SH], SAISSHOONS GANIGWOD GNV SASSVID TIY SLINSAY 
‘SALLIALLOY SNOIYVA NI SIOOHOS IOULNOD GNV ALIAILOV NI OL GANDISSY SONILVY “HOIAVHSG dO ALIIVNH 40 SONILVY 


IIAX 


WANIV 
— 


= 
a2 83 
§ 23 88 
4 an oF 
225 
3 
= 
Z Z oO 


December, 1939] 


observation represents an independent unit of 
the project, and the results for each semester 
were treated independently. In Table XVII 
is given the essential information concerning 
the results for the five semesters of the ex- 

riment. The distribution of classes repre- 
sented in Table XVII is summarized in 


Table I. 
Perhaps the most striking thing about 
Table XVII is the almost complete agreement 
in the results from semester to semester. 
Whatever the differences between the two 
groups of schools signify, these differences 
seem to be present with gratifying consist- 
ency. This consistency from semester to 
semester lends additional weight to differences 
which, in isolation, might not be deemed 
significant. 
The results show that the activity classes 
consistently surpass the control classes in 
quality of cooperative, critical, experimental, 
leadership and self-initiated activities. In each 
of these categories, the critical ratio exceeds 
three for at least one semester and the uni- 
formity of the differences between the groups, 
in direction and in size, enables us to say with 
complete assurance that these differences 
could not have arisen in samples chosen at 
random from the same total population. At 
the same time, the superiority of the control 
classes in recitational activities is uniform 
and unquestionable. The results on work- 
spirit show no clear-cut differences, and we 
have no basis for attributing superiority to 
either group. The observations of negative 
work-spirit were rather sketchy, and not too 
dependable, so that the failure to secure clear- 
cut results here may be a function of that 
fact. On the other hand, it is of course en- 
tirely possible that the changes in procedure 
represented by the activity program have no 
efiect upon the behavior defined in the cate- 
gory of work-spirit. 
These observations started near the begin- 
ning of the experiment when the activity pro- 
' gram had recently been initiated in the 
schools, and have continued as the program 
_ developed. It would be fascinating, there- 
_ fore, to try to trace developmental trends in 

the data. Several factors, however, make 
| any such procedure extremely precarious. In 
| ‘he first place, the pupils and teachers who 
| were observed in later semesters are gener- 

ally not the same as those who were observed 
| ‘0 the beginning semesters. The results are 


EVALUATION OF THE ACTIVITY PROGRAM 


189 


not developmental in the sense that they fol- 
low the same pupils or teachers through the 
five semesters. In the second place, we must 
be aware of the possibility of fluctuation 
among observers over a period of two and a 
half years. Standards and skills in observing 
and recording may change over this length 
of time, in spite of efforts to maintain a uni- 
form standard. In the third place, the stand- 
ard of rating may change with the passage 
of time. This factor is likely to be particu- 
larly important in the data present above. 
All the materials for a single semester were 
organized and rated during the months just 
after the close of that semester. Consequently, 
records for a given semester were rated in 
comparison with other materials from the 
same semester. The standard was relative to 
the material of a single semester, rather than 
to any more comprehensive array of material. 
It seems likely, then, that any progressive 
shifts in the classroom behaviors would be 
somewhat camouflaged by this relative stand- 
ard for rating. 


In view of the factors discussed above, any 
interpretation of trends must be extremely 
tentative. As a matter of fact, the striking 
thing is the high degree of uniformity from 
semester to semester. If we look at the col- 
umn of differences between means for the two 
groups, we find only one difference which is 
really suggestive. This is the tendency toward 
greater difference between the two groups in 
self-initiated activity. We can suggest as a 
possibility that as time has gone by, and the 
activity program has developed in these 
schools which we are studying, those activ- 
ities which are included in our category of 
self-initiated activities have come increas- 
ingly to the fore. 


As in the case of the coded observations, 
it was possible to obtain follow-up data on a 
limited number of children. Records were 
available for 121 activity and 59 control chil- 
dren for the three semesters ending in June 
1937, January 1938 and June 1938. Records 
were also available for another group consist- 
ing of 222 activity and 145 control children 
for the three semesters ending in June 1938, 
January 1939 and June 1939. Since we have 
no data as to the reliability of these individ- 
ual records (the reliability coefficients are 
based upon group averages rather than indi- 
vidual scores) the results are not at all defin- 
itive. In Table XVIII the average rating for 


190 


JOURNAL OF EXPERIMENTAL EDUCATION 


[ Vol. 8, No. 2 


TABLE XVIII 


RATINGS OF QUALITY OF BEHAVIOR. AVERAGE RATINGS OF CHILDREN EACH OF WHOM Was 
OBSERVED DURING THREE CONSECUTIVE SEMESTERS 


1937-38 Study 


Semester Control 
ending N=59 
Coo tive 
activities ae June 1937 2.15 
Jan. 1938 2.80 
June 1938 2.01 
Critical activities.... June 1937 2.34 
Jan. 1938 2.96 
June 1938 2.49 
Experimental 
activities......._- June 1937 1.92 
Jan. 1938 7.86 
June 1938 8.26 
Leadership activities June 1937 1.38 
Jan. 1938 . 85 
June 1938 .93 
Recitational ac- 
June 1937 6.15 
Jan. 1938 8.35 
June 1938 7.99 
Self-Initiated ac- 
June 1937 1. 90 
Jan. 1938 3.01 
June 1938 3.03 
Work Spirit ac- 
| June 1937 . 89 
Jan. 1938 . 93 
June 1938 1. 53 


each semester is presented. This can be com- 
pared with the average for the same children 
for two other semesters. The results for the 
activity and control children and for the two 
sequences of semesters are presented sepa- 
rately. 

The results show some quite large shifts 
from one semester to another. The critical 
ratios (omitted from the table because of 
space limitations) indicate that a number of 
these differences are significant and that some 
factor is making for a real change in the 
scores of the children in different semesters. 
However, the changes rarely seem to repre- 
sent a progressive trend over the three con- 
secutive semesters. A gain in the second 
semester is frequently followed by a loss in 
the third. Furthermore, the results from the 
two sets of data are contradictory as often as 
not. Gains in one set are paralleled by losses 
in the other. It would seem that the shifts 
must be due to factors other than the cumu- 
lative effect of a particular type of education 
—factors essentially irrelevant to the concern 
of this experiment. It does not seem possible 
to identify any differential effect of the two 
programs upon individual development from 
this material. 


1938-39 Study 
Activity Semester Control Activity 
N=151l ending N=145 N=222 

2. 82 June 1938 2.70 3.39 
5.01 Jan. 1939 2.14 4. 50 
4.37 June 1939 2.24 3.70 
2.22 June 1938 1.@ 2.13 
3.65 Jan. 1939 2.75 4.55 
2.71 June 1939 2.11 3. 54 
5. 90 June 1938 7.56 8.18 
5. 32 Jan. 1939 6. 67 8.76 
7.07 June 1939 7.24 6. 54 
3.01 June 1938 . 67 1.64 
1.49 Jan. 1939 . 56 1. 23 
1.22 June 1939 . 58 95 
4.49 June 1938 9.37 5. 23 
4.53 Jan. 1939 8.68 5. 80 
4.61 June 1939 7.81 4. 84 
4.03 June 1938 3.47 2.78 
4.51 Jan. 1939 3. 52 5. 28 
4.01 June 1939 2.59 5.19 

. 80 June 1938 1.47 1.04 

.99 Jan. 1939 74 
1.34 June 1939 . 63 


COMPREHENSIVE ACHIEVEMENT TEST 


The Comprehensive Achievement Test was 
administered to the children in the activity 
and control schools near the end of the spring 
semester of 1937 and again near the end of 
the spring semester of 1938. The results for 
1937, reported in terms of grade scores, are 
presented in Table XIX. The results in 
Table XIX indicate that the two total groups 
were very nearly equal in performance in this 
test at that time. The difference of about one 
tenth of a grade in favor of the control group 
corresponds almost perfectly with the differ- 
ence in average performance on the intel- 
ligence test given to the children making up 
these groups. 

Table XX presents a comparison of the 
activity and control classes on the Compre: 
hensive Achievement Test administered 10 
the spring of 1938. Results here are presented 
both in raw scores and in grade scores. A 
comparison of Table XX with Table XIX 
shows that the differential between the con 
trol and the activity groups has become 
larger. The control group now exceeds the 
activity group in performance upon this te 
by almost half a grade. The critical ratio, 


December, 1939] 


computed for the raw scores, shows that this 
difference does not have complete statistical 
reliability. Inasmuch as these are not 
matched groups, we must be cautious in draw- 
ing any inference, but the suggestion is that 
children in the control schools, in comparison 
with the children in the activity schools, are 
making equal or greater progress on the total 
of the broad variety of achievements covered 
by this test. : 
One significant fact is that both groups did 
very much better on this test in 1938 than 
in 1937. In the activity group the difference 
amounted to about nine-tenths of a grade, 
and in the control group the difference 
amounted to about one and _ three-tenths 
grades. Whether this difference is to be ex- 
plained in terms of specific practice with the 
test, discussion of the contents of the test 
after it was administered the first time, more 
general attention by the teachers to the types 
of outcomes considered in the test, or some 


EVALUATION OF THE ACTIVITY PROGRAM 


other factor or pattern of factors cannot be 
determined from our data. These large shifts 
in level of performance in the two groups 
throw doubt upon the validity of comparisons 
between the two groups on this test. 

The very large change in performance on 
the test between the 1937 and the 1938 test- 
ing is further brought out by the records of 
a small sample of individuals for whom rec- 
ords of two successive tests were located. 
These results are presented in Table XXI. 
We see that these pupils made a gain in one 
year of more than three school grades, as 
represented by the grade scores on this test. 
Such a change would indeed be phenomenal 
if it were truly representative of the general 
level of educational accomplishment of these 
children. 

We see from Table XXI that the gains in 
the control and activity schools are quite 
similar. Whatever small difference there is 
seems to favor the control school. 


TABLE XIX 
GRADE SCORE AVERAGE FOR COMPREHENSIVE ACHIEVEMENT TEST, SPRING 1937 
4A 4B 5A 6A 6B 


4.26 4.57 5.15 6. 48 6.70 


All 
Activity 

Average grade score 5.45 

Control 


Average grade score_ __ 


5.77 


4.21 4.74 5.18 6. 07 6.41 6. 93 5. 54 


TABLE XX 


AVERAGE RAW SCORES AND GRADE SCORES IN COMPREHENSIVE ACHIEVEMENT TEST, 
SPRING 1938. GRADE SCoRES ARE SHOWN IN PARENTHESES 


No. of classes 
Activity Control 


Critical 
ratio 
—1. 25 
— .97 
— .23 


—1. 64 


S.E. 
diff. 
1.70 
1.63 
1.45 


1.12 


Taw scores 


Control Difference 


—2.12 
—1. 57 

—1. 84 


Average 
Activity 


Combined 4th, 5th, 6th 111 


COMPREHENSIVE ACHIEVEMENT TEST SCORES OF CHILDREN TESTED IN JUNE, 1937, 
AND RETESTED IN JUNE, 1938 


Ave 
June 1938 . E. of 
diff. 
1.57 


Difference Ss 


13. 08 
3.01 


13. 84 
3.35 


Score 
June 1937 
47.48 
6.10 


Activity (N=70) 


Grade 
387 42 42.24 44. 36 
(5. 15) (5. 54) 
38 46 49.26 50. 83 
(6. 45) (6. 85) 
36 46 55. 66 56.00 
(7. 93) (8. 00) 
134 48.91 50. 75 
(6. 38) (6. 83) 
TABLE XXI 
Critical 
ratio 
8.33 
9.11 
control 
50. 68 2.05 6.75 


192 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


TABLE XXII 


AGE AND INTELLIGENCE OF EQUATED GROUPS TESTED BY MEANS OF COMPREHENSIVE 
ACHIEVEMENT TEST, JUNE, 1937 


No. of Average Standard 

Schools pupils score deviation 
Activi 460 10. 54 1.13 
Contro 460 10. 56 1.11 
Activi 460 72. 40 15.95 
Contro 460 72.15 15.95 

TABLE XXIII 
COMPREHENSIVE ACHIEVEMENT TEST SCORES OF MATCHED GROUPS, JUNE, 1937 

No. of Average scores S.E. of Critica] 
Grade pupils Activity Control Difference diff. ratio 
131 42. 64 41.48 1.16 1.03 1.13 
TE ee 153 49.44 47.92 1. 52 . 81 1. 88 
2 176 50.45 47. 50 2.95 1.16 2. 54 

Combined 4th, 

5th & 6th________- 460 47.45 45. 50 1.95 . 50 3.90 


The populations discussed so far were not 
equated for age and intelligence, and so it is 
possible that differences in performance upon 
the Comprehensive Achievement Test may 
have been due to differences in important 
background variables. An attempt was made 
to test this in the 1937 data. Individuals in 
the control schools were matched with indi- 
viduals in the activity schools who were in 
the same school grade and of approximately 
the same age and intelligence. A population 
was made up of these individually matched 
pairs. Table XXII presents the character- 
istics of the control and activity groups in 
this population, and Table XXIII presents 
the results for this matched group on the 
Comprehensive Achievement Test. In con- 
trast with the results for the total group, 
where it was found that the control group 
slightly excelled the activity group, we find in 
the present instance a reliable difference in 
favor of the activity group. Unfortunately, 
data for such a matched population are not 
available for the 1938 testing, so it is impos- 
sible for us to say whether this superiority 
has been maintained. 


WRIGHTSTONE TESTS 


The battery of tests prepared by Dr. 
Wrightstone for use in this study was admin- 
istered toward the end of the spring semester 
in 1937, 1938 and 1939. The results of these 
three testings are presented in Tables XXIV, 
XXV and XXVI, respectively. For 1937, data 
are available upon a population of matched 
pairs. For 1938 and 1939 the data refer to 


total class groups, with the group considered 
as the unit, without individual matching. 
The individually matched pairs probably pro- 
vide a more valid and sensitive test of the 
outcomes in the two groups. 

We may say in general that all the 1937 
comparisons favor the activity groups. For 
the 1938 and 1939 comparisons, no one of 
the differences is statistically significant, due 
in part to the small number of class groups. 
In 1939, however, two of the seven compar- 
isons slightly favor the control group, four 
of the comparisons favor the activity groups. 
and in one of the tests the two groups are 
equal. 

The tests of social beliefs and attitudes, 
knowledge of current affairs and personal 
adjustment have shown consistent differences 
in favor of the activity group but the differ- 
ences are not statistically reliable in 1938 or 
1939. 

A further glance at Tables XXIV, XXV 
and XXVI will reveal certain vagaries. For 
example, in “working skills”, the average 
score of the activity classes drops from 27 
in 1937 to 22.66 in 1938 and then rises to 
26.50 in 1939. The corresponding control 
group averages drop from 25.26 to 21.62 and 
then rise to 27.04. These fluctuations from 
year to year probably are due mainly to dif- 
ferences in difficulty of successive forms of 
the test. 


MoperRn ScHoot ACHIEVEMENT TESTS 


The Modern School Achievement Test was 
administered to the pupils in the activity 


December, 1939] 


EVALUATION OF THE ACTIVITY PROGRAM 


193 


TABLE XXIV 


4verace SCORES ON WRIGHTSTONE TESTS, JUNE, 1937. BASED ON TESTS OF CHILDREN MATCHED 
: WitH REsPEcT TO AGE, SEX, GRADE AND INTELLIGENCE TEST SCORES 


Average scores 


Activity 


27.00 
19. 46 


40. 90 
32. 86 
35.18 
56. 90 
48.48 


Test 

Working Skills 
Explaining Facts. ... 
Applying 

Generalizations. - - 
Current Affairs 
Social Beliefs 
Personal Adjustment 
Social adjustment. - - 


S. E. 
diff. 


Difference 
1.74 
. 60 


Control 


25. 26 
18. 86 


40.35 
32.02 
33. 92 
55. 45 
46. 30 


55 


1.45 
2.18 


TABLE XXV 


AveraGe SCORES ON WRIGHTSTONE TESTS, JUNE, 1938. BASED ON TESTS OF CHILDREN MATCHED 
WiTH RESPECT TO AGE, SEX, GRADE AND INTELLIGENCE TEST SCORES 


No. of 

classes Activity 
22. 66 
19. 95 


12. 63 
31.72 
33. 55 


57. 28 
46. 22 


Test 

Working Skills 
Explaining Facts- -- - 
‘Applying 

Generalizations _ 
Current Affairs 
Social Beliefs 
Personal 

Adjustment 
Social Adjustment 


Average scores 
Control 


Critical 
ratio 
.79 
.78 


22 
1. 40 
1. 54 


1.19 
—. 08 


S. E. 
diff. 
1.32 
1.11 


.79 
1. 43 
1.17 


1.13 
98 


Difference 


* For non-comparability with 1937 score, see earlier description of test. 


TABLE XXVI 
AVERAGE SCORES ON WRIGHTSTONE TESTS, JUNE, 1939 


No. of 
classes Activity 
26. 50 
20. 50 


12. 88 
31.34 
33. 66 


57. 84 
47.25 


Test 

Working Skills 
Explaining Facts-.. - 
Applying 

Generalizations _ 
Current Affairs 
Social Beliefs 
Personal 

Adjustment. _____ 
Social Adjustment __ 


control classes in June 1937, January 1938 
and January 1939. The results from these 
three testings are presented in Tables XXVII, 
XXVIII and XXIX. 

From the children tested in June 1937, a 
matched group was made up. Each child in 
an activity school was paired with a child 
from a control school on the basis of age, sex, 
grade and intelligence test score. The results 
for this population of 454 matched pairs are 
presented in Table XXVII. This group is 
made up of 130 fourth grade children, 1 51 
rm grade children and 173 sixth grade chil- 


Average scores 


Difference 
—. 54 
—.26 


Control 


27.04 
20. 76 


12. 88 
29. 84 
33.04 


55. 00 
45.89 


1.58 
75 


Considering the total group, we find the 
differences in reading to be small and unre- 
liable. The control group is ahead on both 
arithmetic tests. One of the differences sat- 
isfies the customary criterion of statistical re- 
liability, and the other approaches it. The 
results on the spelling and language usage 
tests both favor the activity group, but 
neither of these differences is large enough 
to be statistically significant. — 

A separate analysis was made of each grade 
group, but the tables were not deemed to be 
of sufficient importance for inclusion here. 
There was a good deal of fluctuation from 


= 
No. of Critical 
382 5.37 
“35 Ob 
376 44 1.26 
387 “84 “43 1.97 
= 
361 63 3.48 
21.62 1.04 
19. 08 "87 
36 12.46 17 
36 29.72 2°00 
36 31.75 1.80 
36 55. 94 1.34 
36 
S. E. Critical 
diff. ratio 
1.48 —. 36 
97 
24 0 16 0 
24 1.50 1.56 96 
24 "62 1.07 58 
24 2. 84 1.80 


194 JOURNAL OF EXPERIMENTAL EDUCATION 


grade to grade, but this was generally quite 
irregular. The results suggested that the 
greatest arithmetic deficit for the activity 
group was in the fourth grade and the small- 
est in the sixth, but the differences were not 
large enough to be at all conclusive. 


A group of matched pairs was again assem- 
bled as a basis for the analysis of the tests 
given in January 1938. The 278 pairs in this 
group included 121 fourth grade, 123 fifth 
grade and 34 sixth grade pairs. This popula- 
tion is obviously different in grade composi- 
tion from that studied in June 1937. 


For the January 1938 and the January 
1939 testing, grade scores rather than raw 
scores were available. Since some of the ex- 


[Vol. 8, No, 2 


treme grade scores were recorded merely as 
Low, High and Very High, and consequently 
were undistributed, medians were used as , 
measure of central tendency. The median 
grade scores for the activity and the control 
group are presented in Table XXVIII. 
From Table XXVIII it appears that the 
activity children now fall behind the contro} 
children on each sub-test included in the total 
test. The deficits in reading comprehension 
and language usage are small and unreliable 
The deficits in reading speed and spelling are 
of the order of one fourth of a school grade, 
but still are not statistically reliable. In arith. 
metic computation and arithmetic reasoning 
the control children surpass the activity chil- 
dren by half a grade, and the difference seems 


TABLE XXVII 


MODERN SCHOOL ACHIEVEMENT Test RAW SCORES OF MATCHED CHILDREN, JUNE, 1937. Basen 
ON SCORES OF PAIRS OF CHILDREN NUMBERING APPROXIMATELY 130 IN THE FOURTH, 
150 IN THE FIFTH AND 170 IN THE SIXTH GRADE 


No. of Average raw score S.E. Critical 

Readi Test children Activity Control Difference diff. ratio 

ading 

Comprehension 454 40.85 40.55 . 30 . 53 57 
Reading Speed__.-_- 453 28. 02 28.18 —.16 . 52 —.31 
Arithmetic 

Computation 454 16. 84 17. 76 —.92 .26 —3.61 
Arithmetic 

- 453 11.68 12.38 —.70 . 29 —2. 46 
aes 453 44.90 43.75 1.15 . 67 1.73 
Language Usage... 453 28.35 27. 57 .78 .37 2.09 

TABLE XXVIII 


MopERN SCHOOL ACHIEVEMENT TesT GRADE SCORES OF MATCHED CHILDREN, JANUARY, 1938. 
BASED ON SCORES OF PAIRS OF CHILDREN NUMBERING RESPECTIVELY, 121, 123 
AND 34 IN THE FOURTH, FIFTH AND SIXTH GRADES 


Median score S.E Critical 

Activity Control Difference diff ratio 
Reading Comprehension - _ 6.14 6.19 — .05 — .33 
5.78 6. 07 — .29 .21 —1. 38 
Arithmetic Computation_-____-__-- 5. 81 6.32 — .61 .12 —4. 25 
Arithmetic Reasoning... 4.92 5. 41 — .49 —4. 50 
5.74 6.02 — .28 —2. 60 
Language Usage. 5. 55 5.70 — .15 .19 —.T 

TABLE XXIX 


MODERN SCHOOL ACHIEVEMENT TEST GRADE SCORES OF ACTIVITY AND CONTROL 
CLASSES IN GRADES 4-6, JANUARY, 1939 


No. of classes Median score S.E. Critical 
— Test Activity Control Activity Control Difference diff. ratio 
ing 

Comprehension ____- 112 129 5.21 6.72 —.61 .25 —2.04 
Reading 110 121 5.18 ts —. 
Arithmetic 

Computation _______ 112 127 5.40 5.85 — .45 19 —2.87 
Arithmetic Reasoning - 111 128 4.60 5.00 .40 19 
112 129 5.45 — —1.18 

nguage Usage_____. 111 127 5.05 5.46 —.4l1 —2.2 


December, 1939] 


clearly to be larger than could have arisen by 


chance, 
The separate grade groups, tables for 
which have not been included, were too small 


to give trustworthy comparisons. In this test- 
ing, in contrast with that of June 1937, the 
sixth grade activity group showed the greatest 
deficit on four of the six tests, including the 
arithmetic tests. 

No group of matched pairs was assembled 
for the January 1939 testing, so the analysis 
has been made with a much larger, but un- 
matched group of classes. The results are 
shown in Table XXIX. Grade scores and 
medians were used in analyzing these data, 
as was the case with the January 1938 data. 
Each median is computed from the medians 
of the component classes. 

At the time of this testing, the activity 
group shows a deficit on every sub-test, rang- 
ing from a fifth to a half a grade on the dif- 
ferent sub-tests. The difference between the 
two groups in arithmetic scores continues to 
be large. The other large differences are in 
reading comprehension and language usage. 
These were the two tests which showed the 
smallest difference in 1938. None of the crit- 
ical ratios in the 1939 results is as large as 
three, due in part to the insensitivity of sta- 
tistics based upon whole classes as units. 
Within the limitations of our sampling, how- 
ever, the results suggest that the control 
classes surpass the activity classes in the 
skills measured by these tests. 

In summary, the cumulative evidence of 
these testings seems to leave little doubt that 
the children in the activity classes lag appre- 
ciably behind children in the control classes 
in some of the educational outcomes which 
are measured by an achievement test of aca- 
demic skills. The differences in reading speed 
and comprehension are generally small and 
unreliable. The differences in arithmetical 
computation and arithmetical reasoning are 
rather substantial and have uniformly favored 
the controi group from the time of the first 
testing. In spelling and language usage the 
differences are fairly small, favoring the activ- 
ity group at the first testing but favoring the 
control group at later testings. The results of 
second and third testings are less favorable 
to the activity group than the results from 
the first testing, and it seems that the longer 
the activity program has been in operation, 
the larger is the relative superiority of the 
control over the activity classes in these tests. 


EVALUATION OF THE ACTIVITY PROGRAM 


195 


PART V 


RECAPITULATION AND DISCUSSION 
OF RESULTS 


The findings presented in the preceding 
chapter raise several questions. It was 
pointed out that the children in activity 
classes, while surpassing the control children 
on many of the measurements made in this 
study, lag behind the control classes in ac- 
complishment in certain tool subjects (mainly 
arithmetic and spelling). In the meantime, 
the activity children have undertaken the 
additional projects involved in the activity 
program. What are these additional projects 
and experiences involved in the program of 
the activity as contrasted with the control 
schools? To what extent are the objectives 
and supposed values of these projects and 
experiences being realized? Assuming that 
the objectives are being realized to a substan- 
tial degree, do the benefits thus obtained out- 
weigh the benefits which accrue to the con- 
trol children by virtue of their superiority in 
certain tool subjects, such as arithmetic? 

Unfortunately, it is not possible to give a 
definitive answer to any of these questions. 
It is possible, however, to set forth observa- 
tions that give at least a partial answer. 
What follows is based in part on interpreta- 
tion of results already reported, in part upon 
impressions obtained during visits to class- 
rooms, and in part on informal reports sub- 
mitted by the observers. It incorporates parts 
of a manuscript prepared by the senior 
author, based upon informal records of visits 
to about a hundred classrooms. This report 
was submitted to all of the observers for 
comment, criticism and supplementation. 


Differences in Procedures and Daily Pro- 
grams in Activity as Compared With Control 
Schools —A systematic observational study 
of what actually goes on from hour to hour 
and from day to day in the activity as com- 
pared with the control classrooms has not 
been made. The results of the coded obser- 
vations in themselves reveal some differences 
that reflect dissimilarities in practice; this is 
notably apparent in the values showing the 
relative frequencies of recitational activities. 
The coded observations do not, however, ade- 
quately reflect many of the differences which 
are apparent to anyone who observes a num- 
ber of classes in the two groups. Moreover, 
the school practices questionnaire likewise 


196 


appears to fail fully to reveal differences that 
might prevail between the two regimes, for 
the items of this instrument elicit information 
as to whether or not this or that practice has 
occurred at all during a period of time rather 
than information as to the actual frequency 
of the practice or the extent to which it per- 
meates the daily life of the class. 


Many of the differences between the two 
groups are hard to define, but among other 
things it could be observed that the activity 
classes, on the whole, exhibited more partici- 
pation by the pupils in class discussion and 
debate, more participation by the pupils in 
the planning and direction of some or all the 
features of the work of the class (partly by 
way of pupil chairmanships, committees 
charged with responsibilities and duties of 
various kinds, etc.), more freedom on the part 
of the pupils, individually or in small groups, 
to work on projects of their own; more free- 
dom (usually during some periods of the day 
more than others) to move about, to consult 
with other pupils, to approach the teacher 
and converse with her; more opportunity for 
the pupils to evaluate and criticize the per- 
formance of members of the class; more 
apparent opportunity for individual pupils to 
exercise and obtain recognition for their own 
interests and aptitudes, including the oppor- 
tunity to do “research” on special aspects of 
subject matter that is being considered by the 
class as a whole, and encouragement to tap 
sources of information outside the traditional 
text-books by way of bringing in reports, 
illustrations, clippings, etc., from sources not 
assigned to the class as a whole; more em- 
phasis on field trips. 


Features which especially seemed to dis- 
tinguish the “‘average”’ activity from the ‘“av- 
erage” control class, aré, first, more outward 
appearance of pupil self-direction, second, 
more diversity and a larger range of occupa- 
tion, especially during certain periods of the 
day; third, more projects of the sort that cor- 
relate various enterprises and skills as dis- 
tinguished from the study of isolated subject 
matter; fourth, a considerably larger display 
of the pupils” handiwork (drawings, models, 
work books, posters, class journals, class 
anthologies, etc.). 

It should be emphasized that the foregoing 
account represents a general description of the 
scene as witnessed by an observer, rather 


than a quantitative inventory. Further, it 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No, » 


should be reiterated that there are large dij. 
ferences between individual classes under 
different teachers, not only in the more obvi- 
ous manifestations but also in the more subtle 
aspects of the procedure. Indeed, observers 
have noted that there may be a decided dis. 
crepancy between semblance and substance: 
for example, a class which exhibits the exter. 
nals of democratic management may actually 
be as thoroughly dominated by the teacher as 
a class which shows none of these formalities 
of pupil self-direction. Again, observers have 
been impressed by the apparently fine spirit 
of comradeship and freedom which may pre- 
vail in a class in which the teacher observes 
but few of the outward forms of democratic 
management. It may further be mentioned 
that many teachers in supposedly conven- 

tional classrooms find means of incorporating ’ 
many of the practices which are frequent in 
the activity schools.* Some of the observers 
in this study actually commented on the fact 
that there seemed to be an increase with 
time, in some control classes, in the adoption 
of “activity” practices. To be sure, such prac- 
tices no doubt might, in many cases, be attrib- 
uted to the personality and resourcefulness of 
the individual teacher, rather than to an iniil- 
tration of a new or different theory of edv- 
cation. However this may be, in the case of 
individual classes, it still can be maintained 
that the general run of classes in the two 
groups exhibit large differences along lines 
such as mentioned above. 

Now, assuming that such differences exist, 
two questions arise. First to what extent do 
the quantitative findings in the present study 
give a measure of the outcomes of the pro- 
visions supplied in the program of the activ- 
ity schools? Secondly, and quite as impor- 
tant, to what extent do the methods in the 
present study fail, perhaps, to tap these oul- 
comes? These questions call for a review 0! 
the findings presented in the preceding stt- 
tion; they call also for an appraisal of the 
findings, and a consideration of the extent t 
which these findings adequately reflect and 
evaluate the manifold differences between the 
two groups. 

Adequacy of Instruments for the Measurt- 
ment of Outcomes of Activity Program—tt 
may be pointed out that an investigator {aces 
a difficult problem when he is called upon " 


* Records of class visits indicate, for example. that som 


of the most impressive and tly valuable class disce 
sions, including interesting fan | between members of th 
class, were observed during visits to control schools. 


December, 1939] 


appraise two presumably quite different edu- 
-ational regimes. To obtain comparable data, 
he must apply similar measurements and pro- 
cedures. In a study involving large numbers 
of classes, this means, in effect, that his meas- 
urements necessarily must be rather gross; 
his methods cannot be adapted to the circum- 
stances that prevail in individual classes. The 
creater the diversity between the activities in 
different classrooms, the greater the freedom 
under which the teachers and pupils operate, 
the greater the range of individual projects 
which classes as a whole, or individuals within 
a given class, may undertake, the less ade- 
quately do gross measurements, similarly 
applied to all, reflect or appraise the perform- 
ances which the pupils have undertaken. 
| Stated in other words, it may be contended 
_ that the measurements used in this study, 
especially those dealing with intellectual 
skills, are relatively less adequately adapted 
to the activity classes than to the control 


» classes. These tests measure certain perform- 


' ances that are common both to activity and 
control classrooms. But the average activity 
class, apart from concentration on such per- 
formances, also dipped into a larger variety 
of additional undertakings than did the aver- 
age control school. 


| Some Features Not Measured by the 

Instruments That Were Used.—For example, 
as indicated above, it appeared that the 
pupils in activity classes undertook more 
_ drawing, dramatization, and other artistic en- 
_ deavors in connection with their studies than 
did the children in control schools: but the 
tests do not measure accomplishments in 
_ these areas. Similarly, there is no measure of 
» ‘he manner in which greater opportunity for 


" class discussion may have affected the pupils’ 


abilities in oral composition. Further, the 


| projects undertaken during “activity” periods 


| might involve many special skills and the 
| acquisition of a considerable amount of in- 
lormation on specific topics which are not 
touched at all in the general tests of infor- 
mation or achievement. In other words, while 
' the scores measure the extent to which chil- 


| dren in both groups are versed in matters 


| common to the course of study in both groups, 
) ‘hey do not measure the additional or extra 
learnings and skills which children in the more 
llexible activity program may have acquired. 
This may be pointed out even though it is 
recognized that experiences which were not 


EVALUATION OF THE ACTIVITY PROGRAM 


197 


tapped by the standardized tests were not 
restricted exclusively to the activity schools. 
Moreover, it is not implied that such learn- 
ings are necessarily more valuable than 
mastery of the conventional school subjects. 

Unfortunately, we have no measurements 
of the benefits the activity children may have 
obtained from specialized and _ individual 
projects such as those described above. We 
therefore have no concrete evidence on the 
basis of which to answer the question: To 
what extent would the benefits thus obtained 
offset, or perhaps more than offset, the ad- 
vantage which control pupils have in the con- 
ventional academic subjects, notably arith- 
metic? To obtain such evidence, it would be 
necessary to employ methods of study con- 
siderably more intensive than the methods 
that have been used so far. 


Results of Tests of Study Skills, Etc.—It 
might be maintained that the Wrightstone 
tests should yield some evidence on this score. 
If the undertakings in activity classes serve 
to sharpen the children’s mental processes 
and enable them to think more clearly and 
effectively in dealing with intellectual prob- 
lems, one might, theoretically at least, expect 
that the activity children would earn sub- 
stantially higher scores in the tests dealing 
with working skills in the social studies, abil- 
ity to explain facts in the social studies, and 
ability to apply generalizations. Actually, 
however, the activity pupils do not show pro- 
gressive gains from year to year in these three 
tests; on the contrary, on the most recent 
tests, the activity and control groups are 
practically equal, with the latter group hold- 
ing a slight edge on ‘two of the tests. Accord- 
ingly, as far as the evidence from this angle 
is concerned, it does not appear that the en- 
terprises undertaken by the activity children 
produce learnings of a sort that may be trans- 
ferred to the solving of problems, such as 
those raised in the Wrightstone tests, in such 
a manner as to give the activity pupils an 
advantage over the control pupils. 

Might some other types of tests, more 
directly adapted to the children’s day to day 
intellectual activities, reveal a greater supe- 
riority in the activity pupils? To this ques- 
tion, obviously, the present results supply no 
answer. The available evidence does suggest, 
however, that if tests could be designed to 
measure the intellectual gains derived from 
the activity projects, they probably would 


198 


show that these gains consist partly in greater 
knowledge and competence in connection with 
specific enterprises and subject matters that 
have been studied, rather than in a broadly 
generalized improvement in the mental 
processes involved in “good thinking.” The 
question as to the possible extent and the 
value of such specific learnings remains un- 
known, and we cannot, on the basis of the 
present results, conjecture as to what extent 
the values from such learnings offset, or more 
than offset, the superiority of the control 
children in some academic skills. 


Results of Other Wrightstone Tests —The 
remaining tests of the Wrightstone series, and 
the results obtained from the observational 
studies, give a more positive picture. The 
activity pupils are consistently superior to 
the control pupils in knowledge of current 
events, although the difference is not large, 
nor is it statistically reliable. The activity 
children likewise consistently exhibit a better 
record of personal adjustment, as measured 
by test results, although here again the dif- 
ference is not substantial or reliable. During 
the two of the three years when the test was 
applied, the activity children likewise surpass 
the control pupils in social adjustment; this 
difference, again, is not large and during one 
year the test results for the two groups are 
practically equal, with a slight but negligible 
advantage for the controls. 


Results of Observational Studies —The 
most outstanding differences between the two 
groups appear in the results-of the observa- 
tional studies. On the basis of the most com- 
plete table (Table XIV), showing results for 
five school terms, it can be seen that the ac- 
tivity children exhibit two to three times as 
many “self-initiated activities’ as do the con- 
trols. In “critical activities” the activity chil- 
dren surpass the controls to the extent of 
about two to one. The activity pupils like- 
wise substantially surpass the controls in fre- 
quency of “experimental activities”, especially 
during the last three semesters of the study. 
They consistently show a larger frequency of 
“leadership activities”; the frequencies here 
are relatively small for both groups, but the 
activity children show a statistically reliable 
superiority during three of the five terms. 
During three of the five terms the activity 
children surpassed the controls in frequency 
of “cooperative activities’; during no term 
was the difference in frequency between the 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No, » 


two groups at all large or statistically rej. 
able, but according to the ratings of anecdotal 
records the activity children consistently, and 
to a substantial degree, surpassed the controls 
in the quality of their cooperative activities 
(Table XVII). In “recitational activities”. 
as pointed out above, the control childrep 
consistently show considerably larger fre. 
quencies than the activity children. In the 
category “negative work-spirit”, the control 
children show lower frequencies than do the 
activity children, during three of the fix 
semesters, but the gross frequencies are small, 
the differences are also small, and at no time 
were they statistically reliable. Moreover. 
the qualitative ratings of this behavior show 
inconsistent and inconclusive trends. (Fur. 
ther comments on this category will be se 
forth below). 


Taken at face value, the picture emerging 
from the observational data may be regarded 
as highly favorable to the activity groups 
In terms of frequencies, the quantitative rec- 
ords indicate that the activity program has 
accomplished its purpose of giving the pupils 
more opportunity and incentive for initiative, 
experimentation, exercise of leadership, and 
exercise of critical thinking. To be sure, it 
might be argued that the quantitative supe. 
riority of the activity pupils in these cate- 
gories is no more than one would expect from 
the fact that the activity program, by its very 
nature, provides more opportunity for the 
display of acts in these categories. But when 
the quantitative findings are considered in 
conjunction with the qualitative ratings, it ap- 
pears that the activity children not only show 
more behavior of this sort but also show a 
better quality of performance in the cate- 
gories just mentioned, as well as in the cate 
gory of cooperative behavior. However, % 
mentioned earlier, the qualitative ratings are 
influenced, to some extent, by the number o! 
items recorded and thus are not entirely dis- 
crete from the quantitative scores. 


Observations Concerning Discipline and 
Morale in Activity Classes.—It is pertinent 
to consider the findings in observational 3 
pects of the study in the light of impressions 
and testimony based upon general observation 
and informal reports. Consider first the “nes 
ative work-spirit” category, which occuple 
a rather humble position in Tables XIV ané 
XVII. Under this category were recorded 
instances of misbehavior, failure to coopersté 


aoa rt Tet Ye eau S&S P 


December, 1939) EVALUATION OF THE ACTIVITY PROGRAM 199 


with the class, instances in which pupils ap- 
parently took advantage of the absence of 
the teacher or abused their privileges when 
they were presumably responsible for their 


| own conduct. Table XIV indicates that the 


activity pupils exhibit more such behavior 


; than do the controls. However, in the judg- 
’ ment of the authors, this fact is not surpris- 


ing; the remarkable thing is that the differ- 


5 ence is so slight. 


The following excerpts, from the manu- 


‘ script previously referred to, bear on this 
point. 


“The writer was continually impressed 
by the generally good discipline maintained 
by children in activity classrooms. Off- 
hand, one might expect that many children 
would cut loose under a policy of less for- 
mal regimentation and more individual 
freedom, but the writer was struck, as he 
began his round of visits, by the fine man- 
ner in which the pupils responded to the 
opportunities for greater self-direction.” 
(The report goes on to describe an illus- 
trative instance, drawn from a school in a 
very underprivileged neighborhood. The 
writer had an opportunity quietly to saun- 
ter back to a classroom in which the pupils 
had been left in charge of themselves for 
some time. He expected, in the light of 
recollection of his own school days, to find 
that at least some of the pupils would have 
abandoned their work for a bit of horse- 
play. But just the opposite proved to be 
the case) : 

“The room was quiet. All the children 
were occupied. A number of them were 
reading at their desks. About five of the 
children clustered in the front of the room, 
discussing a class project under the leader- 
ship of one of their classmates. A child or 
two worked at the blackboard. A few scat- 
tered children were busy with writing, 
drawing and other activities. There were 
no furtive, half-guilty glances toward the 
door as it opened. A couple of children 
looked up and smiled, then went on with 
their work .. . No doubt, in their own 
good time some of the children would be- 
come restless and make mischief (they 
would not be normal children if they 
didn’t), but until this time the teacher’s 
absence had made no difference in their 
behavior.” 


Instances in which the children assumed 
responsibility for their own conduct, and for 
disciplining one another, with the teacher 
present in the room could be multiplied in- 
definitely. It is not implied in the foregoing 
that the activity program produced a magic 
change in children, or that discipline gener- 
ally was enforced by an iron hand in the con- 
trol classes, or that there was a universally 
high level of man-to-man cooperation in main- 
taining discipline in the activity classes. In 
this matter, as in all other features, there were 
individual differences between classrooms. It 
did appear, however, that there was no gen- 
eral tendency for the children in activity 
classes to abuse the greater degree of freedom 
and opportunity for self-discipline which 
these classes seemed to afford. Rather, it 
appeared that children are considerably more 
able to take responsibility for their own con- 
duct than is implied in the customary disci- 
plinary practices found in the conventional 
classroom. In a related section of the report 
it is pointed out that, “obviously, no adequate 
estimate of pupil-teacher relationships can be 
formed simply on the basis of informal obser- 
vations; however, from surface impressions it 
does appear that a surprising degree of co- 
operation between pupils and teachers, with 
a minimum of school-masterish discipline, can 
be established in classes that contain pupils 
who at first glance would seem most un- 
promising.” 


The following excerpt is also drawn from 
the previously mentioned manuscript. It is 
introduced to show the conflicting impressions 
that may emerge from a brief visit as con- 
trasted with systematic observation over a 
period of time. In this instance, the regular 
observers who commented all took strong ex- 
ception to the writer’s statements: “On a few 
occasions the writer’s records contain queries 
that are not strictly concerned with the prob- 
lem of discipline but with the problem of 
practical management of activities in the 
classroom. The children in a classroom may 
be well in hand, and each pupil may be hard 
at work, but by virtue of the fact that so 
many things are going on at once (during 
definitely designated “activity” periods), the 
total effect may be one of seeming disorder. 
On one or two occasions, the whirl of activity 
seemed a bit hectic, somewhat wearing to 
watch. It occurred to the writer that it might 
also be a strain to the children.” 


nd 4 
ols 
en 
he 
he 
ve 
ll, 
ne 
of 
od 
e, 
id 
t 


200 


The foregoing remarks were included in 
the original draft of a paper which was sub- 
mitted for criticism and comment to eleven 
observers who have spent a _ considerable 
amount of time in the classrooms. The pre- 
vailing comment by the observers was to the 
effect that the statement gave an erroneous 
impression. According to several reports, 
things may seem confusing to a casual visitor, 
but if the same visitor observed the activities 
from day to day he would find that there 
actually is a high degree of order and con- 
tinuity in what the children are doing. To the 
extent that this is true, the “strain” men- 
tioned above would reside more in the experi- 
ence of the visitor, who has a hard time in 
getting oriented to all that is going on, rather 
than in the experience of the pupils in the 
classroom. The observers also point out that 
the very fact that the children are more free 
to move about seems to relieve tension which 
other pupils, under stricter discipline, seem to 
try to express through furtive pranks and 
mischief-making. There obviously is need for 
more systematic study of this issue. 


It may be pointed out that it was not prac- 
ticable, in the systematic observations, to deal 
with all aspects of social conduct in the class- 
rooms. In view of this, it is perhaps proper 
to report general impressions regarding one 
rather important feature, namely, the man- 
ners and courtesy of the children. What fol- 
lows is frankly based upon personal impres- 
sions; it represents a statement of opinions 
(in which, it appears, most of the systematic 
observers concur). Observers are frequently 
impressed by the politeness shown by New 
York public school children toward visitors 
to the classrooms. To be sure, one might 
question the genuineness of this politeness, 
and one might perhaps question whether the 
same degree of consideration would extend to 
the out-of-school conduct of the same children 
in the streets, in subways, and in places of 
entertainment. Be that as it may, when chil- 
dren exhibit a hospitable and friendly de- 
meanor in the classroom that, at least, is all 
to the good as far as it goes. Now one might 
ask, what seems to be the effect of the activ- 
ity program on children’s manners? Does it 
produce the self-important and somewhat 
smart-alecky attitudes that one sometimes 
encounters in children in ultra-progressive 
private schools? On the basis of informal 
observation, the answer is a decided no. In- 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, 


deed, it would appear that in many actiyj:y 
classes the children have even more opporty- 
nity for the exercise and cultivation of goog 
manners than prevails in the average contro) 
classroom. It appears that activity classe 
more frequently employ such policies as hay. 
ing official class hosts and hostesses, of ap. 
pointing delegates or committee chairmen ty 
introduce themselves to visitors and shoy 
them around, etc. The oftentimes very gr. 
cious conduct of these deputies pleases the 
vanity of the adult visitor, to say the leas. 
and it is likely that many children profit from 
the opportunity to acquire and to exerci 
some of the techniques of polite social inter. 
course. 

Opportunities for acquiring improved social 
techniques in child contacts also, no doubt, 
are afforded by various enterprises that are 
common in activity classrooms, including the 
give and take free discussion, participation in 
committee work of various kinds, participa- 
tion in parliamentary rule, etc. The system- 
atic measurements employed in this investi- 
gation do not adequately measure the good 
effects, if any, which come from such oppor- 
tunities. The systematic data do, however, 
indicate that while the frequencies of coop- 
erative behavior in the two types of schools 
are about equal, judges of anecdotal records 
rate the cooperative activities in activity 
schools as being superior in quality to the 
cooperative activities in control schools. The 
activity classes receive consistently higher 
ratings during all five semesters represented 
in the investigation, and the differences are 
statistically reliable during three of the five 
semesters. 


Further General Observations.—Needles 
to say, the above comments touch upon only 
a few of the many matters that might be 
explored. It goes without saying, also, that 
many of the enterprises undertaken by chil- 
dren in activity schools undoubtedly are mort 
valuable than others. In the manuscript rt 
ferred to above, mention was made of differ 
ences in practice in different schools. To al 
appearances, it seems that some teachers att 
considerably more successful than others © 
promoting a democratic spirit and in encou 
aging equality of opportunity in the clas 
room; it sometimes appeared, likewise, tha! 
teachers varied in their resourcefulness " 
integrating units of work. At times it 4 
peared that a program of work designed 


December, 1939] 


integrate various activities, such as reading, 
composition, “research”, drawing, spelling, 


| pantomime and dramatization, represented a 
_ type of integration that existed more in the 
' abstract plans of the teacher than in the 
_ minds of the children. Likewise, in connec- 
- tion, with the drawing, illustrative work, 


puppet-making, etc., that several classes un- 


- dertook in connection with various units of 


study, a question might be raised as to the 
extent to which such enterprises serve as an 
aid to intellectual mastery, to what extent 


' they produce useful artistic skills, and to 
what extent they simply represent busy-work. 
_ (It should be added that when, in the report 
referred to above, a question was raised as to 


the value of the time spent by the children in 
drawings and illustrative materials in connec- 


tion with their studies, the observers almost 


to a man protested that in their judgment 


_ these enterprises were valuable, both from an 


intellectual and artistic point of view). At 


_ any rate, there are many questions as to pro- 
' cedure, choice of activity, and profitable use 
- of time that will occur to the observant vis- 
itor. It may also be said that the problem as 
to the best selection of activities is one that 
' is constantly being considered by individual 
’ classroom teachers and by all who are con- 


cerned with the planning of the curriculum. 


SUMMARY 
This report deals with findings obtained in 


_ astudy designed to evaluate aspects of a new 


educational program instituted in a number 
of New York City public schools. The pro- 


_ gram, known as the “activity program,” rep- 


resents an effort to put into practice progres- 


_ sive principles of teaching, curriculum devel- 
| opment, and class management. This report 
presents results obtained by means of various 
| types of measurement applied during each 
» school term from the spring of 1937 through 
© the spring of 1939. 


PoPULATION STUDIED 


The results give comparisons between 
paired “activity” classes and “control” classes 


| (ie. classes in which the newer practices had 
| not officially been introduced). The number 
| of pairs of classes in the study ranged from 
| 32 to 48 each semester, including more than 


‘wo thousand pupils, and consisting mainly of 
sth, sth, and 6th grade classes but including 
also, during some semesters, classes from the 


EVALUATION OF THE ACTIVITY PROGRAM 201 


rst, 2nd, and 3rd grades. In the case of the 
tests of achievement, data were available for 
larger numbers than were included in the 
main body of the study. 


PROCEDURES 


The measurements that were applied have 
been described in detail in earlier sections. 
They include: 

The Modern School Achievement Test, in- 
cluding six sub-tests dealing respectively with 
reading comprehension, reading speed, arith- 
metic computation, arithmetic reasoning, 
spelling, and language usage. This test was 
administered in June 1937, January 1938, 
and January 1939. 

A battery of six tests devised by Wright- 
stone, to measure certain intellectual skills 
and aspects of social and personal adjust- 
ment. These tests were administered in the 
spring of 1937, 1938, and 1939. One of these 
is a test of working skills in the social 
studies; it tests the pupil’s ability to extract 
information from tables and graphs, his abil- 
ity to tell where to locate certain types of 
information, and to use an index. Another 
test covers the pupil’s ability to explain facts 
in the social studies; certain data are pre- 
sented by way of tables and graphs or verbal 
exposition and the pupil is required to judge 
the correctness or pertinence of interpretative 
statements that follow. A third test measures 
the pupil’s ability to apply generalizations in 
connection with subject matter in the social 
studies; statements of facts or happenings are 
presented, followed by a number of general- 
izations, and the pupil is required to select or 
mark those generalizations which are relevant 
to the exposition or (in a later form of the 
test) to match statements and appropriate 
generalizations. Another test is designed to 
measure social beliefs and attitudes; the 
pupil is asked to mark as true or as false a 
number of varied statements, some of which 
are supposed to represent a more progressive 
and some a less progressive point of view 
with regard to controversial issues. A fifth 
test measures knowledge of current affairs. 
A sixth test, patterned on the customary 
paper and pencil type of inventory, includes 
a number of items which are deemed to be 
symptomatic of personal or social adjust- 
ment, and the pupil is called upon to mark 
whether he always, sometimes, or rarely ex- 
hibits the behavior in question; the test yields 


4 
ity 
od 
rol 
ra- 
he 
St, 
m 
Ise 
m- 
ti- 
od 
er, 
p- 
ds 
ty é 
he 
re 
ve 
Ss 
ly 
be 
at 
il- 
re 
é 
4 


202 


two scores, one for personal adjustment and 
one for social adjustment, and these may be 
combined into a single personal-social adjust- 
ment score. 

The Comprehensive Achievement Test, 
prepared by McCall and Herring, including 
brief sub-tests covering a variety of areas, 
such as health and play, reading, arithmetic, 
buying and using things, foreseeing conse- 
quences, understanding of people and things, 
manners, etc. This test was administered in 
the spring of 1937, 1938, and 1939. 

A School Practices Questionnaire, prepared 
by McCall, Herring and Loftus, which em- 
bodies descriptions of a number of practices 
and procedures that presumably should char- 
acterize an activity program, and which was 
designed to give an indication of the extent 
to which the avowed purposes of the activity 
program actually were being put into prac- 
tice in the classroom as revealed by the an- 
swers given by the pupils to the items in the 
questionnaire. 

The McCall Intelligence Test, which is a 
multiple-response intelligence test of the 
multi-mental type. 

The method of direct observation was 
applied during all five semesters of the study. 
The classroom observations consisted of 
(1) “coded” records, designed to measure the 
frequency of various forms of behavior occur- 
ring in the classroom, and (2) “anecdotal” 
records, which were designed to provide rec- 
ords which could be used as a basis for rat- 
ings of the quality of the behavior or per- 
formances in question. The observers noted 
behavior under seven headings as follows: 
cooperative activities, experimental activities, 
leadership activities, self-initiated activities, 
recitational activities, critical activities, and 
negative work-spirit activities. 

In connection with both the “coded” and 
the “anecdotal” records, a given worker 
would observe an entire class at a time and 
record as much as possible of the behavior 
that occurred. The data obtained by direct 
observation were treated primarily in terms of 
class averages and were not analyzed from 
the point of view of the scores of individual 
pupils, except in the case of a few incidental 
comparisons. 

The observer personnel during a given 
semester included from eight to thirteen per- 
sons, each of whom, during the course of the 
semester, observed four activity and four con- 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol, 8, No. 2 


trol classes. Each observation period was 
scheduled to last half an hour. The number 
of periods of “coded” observations per class 
per semester ranged from six to fourteen, 
with a minimum of twelve periods during 
each semester from the spring of 1938 through 
the spring of 1939. The number of “ane. 
dotal” observations ranged upward to twenty 
or more per semester. 

In the treatment of the results of the 
“coded” records, a tally was made of the 
recorded instances of behavior in each of the 
different categories and from this was ob- 
tained a value representing the average num- 
ber of occurrences of the behavior in question” 
per pupil per period of observation. In order 
partially to eliminate cumbersome decimals, 
this value was multiplied by 100 to give the 
hypothetical average per pupil per 100 peri- 
ods of observation. This value was separately 
obtained for each of the classes and compar- 
isons between the activity and the control 
groups are based upon the means of these 
class averages. 

In the treatment of the anecdotal records, 
a compilation was made of the instances of 
behavior of a given category exhibited by 
each child. This account was then rated for 
quality, on a scale from zero to ten, by four 
judges. The averar. of the four judgments 
was taken to represent a given child's score. 
The class average was in turn computed and 
the average of such class averages was used 
as the basis of comparison between the activ- 
ity and the control groups. The ratings of 
these “anecdotal” records were designed to 
give a measure of the quality rather than the 
frequency of the performances in question, 
but since the ratings were influenced to an 
undetermined degree by the frequency of the 
items that were being rated, the results are 
not purely concerned with the factor of 
quality alone. 


RELIABILITY OF THE INSTRUMENTS 


An account of the reliability of the various 
instruments used in the study is given in 
Part III. The reliabilities of the paper and 
pencil tests compared favorably with the reli- 
abilities customarily reported for standardized 
tests. 

In connection with the “coded” observa- 
tions, measurements were made to determine 
the reliability of the observers through com- 
parisons between records obtained by two 
workers who simultaneously but independ- 


December, 1939] 


ently observed the same behavior. The re- 
sults of these computations showed varying 
degrees of agreement in the case of the dif- 
ferent categories, and variations also between 
different observers. The observer reliabilities 
were deemed to be sufficiently high for the 
purpose of the type of comparisons made in 
this study. The reliability of the sampling of 
“coded” observations was measured by cor- 
relating the results of the first five as against 
the second five of ten observation periods. 
The reliability coefficients thus obtained, 
after correction by the Spearman—Brown 
formula, ranged from .or to .g5 in the case 
of the data obtained in 25 classes in seven 
control schools, and from .30 to .94 in the 
case of 26 classes in seven activity schools. 
The range in the case of all the 51 classes 
combined was from .44 to .94. In the latter 
computations, the lowest reliability appeared 
in connection with experimental activities and 
self-initiated activities, with respective co- 
efficients of .44 and .60. 

When reliability coefficients were computed 
on the basis of the scores of individual chil- 
dren, rather than on the basis of class aver- 
ages, the values, after correction by the 
Spearman—Brown formula, ranged from .317 
to .§63. Factors which make for unreliability 
in connection with the “coded” observations 
are discussed at some length in Part III. It 
is concluded that data based upon these ob- 
servations may be used for gross comparisons 
between large groups, but are not sufficiently 
discriminative or reliable to serve as the basis 
for intensive study of individual children or 
for detailed analysis of outcomes or effects 
of specific practices. 


From a practical point of view, it also was 
set forth in Part III that it is considerably 
more difficult for an observer to note and re- 
cord what occurs when the pupils are working 
on individual projects, or are clustered in 
several more or less independent groups, than 
when the entire class, with each member in 
his own seat, is working on a common task 
under the close supervision of the teacher. 
Since the former conditions of freedom pre- 
vailed more often in the activity than in the 
control classes, it is likely that with an in- 
crease in the reliability and completeness of 
the observers’ records the increase in the 
number of recorded items would be relatively 
larger in the case of the activity classes than 
in the case of the control classes (except pos- 


EVALUATION OF THE ACTIVITY PROGRAM 


203 


sibly in the category of “recitational activ- 
ities”). To the extent that this is true, it is 
likely that more reliable recording would pro- 
duce an increase rather than a decrease in 
the differences between the scores of the two 
groups, and thus accentuate even more the 
trend of the findings reviewed below. How- 
ever, it would require more intensive and 
detailed study to measure what the effect 
might be on each of the several categories of 
behavior included in the “coded” observa- 
tions, and to discover whether the scores 
might be changed more in some categories 
than in others. 


The reliability of the data obtained by 
means of “anecdotal” observations was 
measured (1) by correlating the ratings as- 
signed to a substantial selection of records 
by two pairs of independent judges, and 
(2) by correlating the scores obtained when 
the same squad of judges separately rated 
the records submitted by pairs of workers 
who made simultaneous but independent 
observations of what occurred in a number of 
classes. These correlations were made on the 
basis of class averages, in keeping with the 
policy used in the treatment of these data. 
The correlation between scores assigned by 
different pairs of raters ranged above .go 
(after correction by the Spearman—Brown 
formula). The correlation between average 
scores assigned to records submitted by dif- 
ferent observers in the same classes ranged 
from .31 to 1.00. The median coefficient in 
21 such correlations (one for each of the 
seven categories of behavior in the results of 
activity and control groups combined during 
one term, and in the separate results of the 
activity and control groups during another 
term) was .81. In the first series of correla- 
tions, the lowest reliability appeared in con- 
nection with cooperative activities (.38), 
and, in the second series, in the case of 
“negative work-spirit” (respective coefficients 
of .67 and .31 in the activity and the control 
data). It appears that in the case of most of 
the categories of behavior treated in this 
study, the anecdotal records supply data 
sufficiently reliable to show general trends in 
comparisons between large groups. To pro- 
vide material for more detailed analyses, or 
for intensive study of individual children, it 
would be necessary to modify the underlying 
procedure. 


204 


RESULTS 

School Practices Questionnaire.—Results 
obtained by means of the School Practices 
Questionnaire, administered to 4th, 5th, and 
6th grade classes, are available for three sep- 
arate semesters. The scores of the activity 
classes were consistently higher than scores 
of the control classes, and the differences 
were statistically reliable. The respective 
average activity and control scores for all 
grades in each group combined, on the three 
administrations of the test, were 49.44 and 
42.32; 50.51 and 43.35; and 55.80 and 43.95. 
The possible obtainable scores on the test 
ranged from o to 105. Although these dif- 
ferences consistently and reliably favor the 
activity growns, the absolute differences be- 
tween the two averages in each comparison 
do not appear to be large. 

Since the test calls for answers as to 
whether or not a given practice has occurred 
during a period of time rather than for de- 
tailed information as to the frequency of the 
practice and the extent to which it has char- 
acterized the work of the class from hour to 
hour or from day to day, the test perhaps 
does not fully measure such differences as 
may prevail. To measure the actual differ- 
ences in practice and procedure would either 
require a more discriminating test or detailed 
observations in the individual classrooms. 

In any event, the results show that there 
unquestionably is a real difference between 
the two groups. Moreover, the averages indi- 
cate that this difference has increased some- 
what with the passage of time. Further, 
comparisons between the different grade 
groups suggest that the differential between 
activity and control classes increases some- 
what as one goes from the 4th to the 6th 
grade. 

Tests of Intellectual Skills, Social Attitudes 
and Adjustment.—No substantial or reliable 
differences appeared in the, results of the 
various tests of the Wrightstone series. The 
activity children scored somewhat higher than 
the control children when the tests were first 
administered. During later semesters they 
lost most of this advantage on some of the 
tests. On the final tests (spring 1939), the 
control children were slightly, but unreliably, 
in the lead in tests of working skills, and 
ability to explain facts. 

The two groups were equal in scores on a 
test of ability to apply generalizations. 


JOURNAL OF EXPERIMENTAL EDUCATION 


| Vol. 8, No. 2 


The activity children surpassed the cop. 
trols in tests of knowledge of current afiair; 
in tests of social beliefs, personal adjustmen; 
and social adjustment. In no comparison was 
the difference statistically reliable, although 
it may be pointed out that the activity chil. 
dren maintained a consistent, although not 
large, advantage over the controls since the 
inception of the study in three of the tests 
namely, knowledge of current affairs, sociq! 
beliefs, and personal adjustment. 

Modern School Achievement Tests—Re- 
sults are available from tests administered jn 
January 1937, January 1938, and January 
1939. 

On two of the three series of tests, the con- 
trol children were slightly, but unreliably. 
superior to the activity children in reading 
comprehension. On all three test series, the 
control children were somewhat superior in 
reading speed, but not to a statistically reli- 
able degree. 

The control children maintained a substan- 
tial advantage over the activity children in 
arithmetic computation and arithmetic rea- 
soning throughout all three series of tests. 
The difference in favor of the control chil- 
dren in arithmetic was statistically reliable in 
three of the six comparisons and approached 
statistical reliability in the other three. 

The activity children held a small lead in 
spelling on the first test, but fell behind the 
control children on the last two tests; in each 
of these comparisons the difference falls 
short of complete statistical reliability. 

In language usage, the activity children 
likewise surpassed the control children on the 
first test, but fell behind on the last two; the 
difference in favor of the control children was 
not statistically reliable on either of the last 
two tests, but approached reliability on the 
January, 1939, test. 

The general drift of the results of the 
Modern School Achievement Test was toward 
increasing superiority of the control children 
with the passage of time. 

Factors Not Measured by the Various 
Tests—By way of recapitulation of points 
set forth in an earlier section dealing with 
the results of tests of achievement, it may be 
pointed out that these instruments serve, 0 
the main, to measure children’s competence 
in projects that are common to both the ac- 
tivity and the control schools rather than the 
outcomes of such additional projects and 


December, 1939] 


experiences as are provided by the more 
flexible and diversified program of the activ- 
ity schools. 

Among such unmeasured features are pos- 
sible gains that pupils may derive from 
endeavors that are integrated with the reg- 
ular studies in many classes; gains in artistic 
performance; gains in oral composition and 
in other skills associated with class discus- 
sion, participation in planning and evaluation, 
and in parliamentary procedures; and the 
values that may be derived from various spe- 
cial study projects, individual “research,” and 
other departures from the conventional out- 
line of study. Neither the standardized tests 
nor the observational techniques applied in 
this study give an adequate measure of the 
nature or the outcomes of such projects, nor 
do they answer the question as to whether 
such outcomes offset, or possibly outweigh, 
the benefits derived by the control pupils 
from superior competence in the conventional 
school subjects (notably arithmetic). 

Such evidence as is available indicates that 
the benefits which pupils may have derived 
from the activity program are not of such a 
nature as to sharpen the children’s compe- 
tence in the general tests that were applied 
to measure working skills, ability to interpret 
facts, and ability to generalize in the social 
studies. It appears that such learnings as do 
occur must be tested by instruments more 
pointedly designed to measure outcomes of 
the particular enterprises which the pupils 
have undertaken. 

Results of Observational Studies —The 
activity classes made considerably higher 
scores than did the controls, both in the re- 
sults based upon the frequency of various 
performances (coded observations) and the 
quality of the performances (anecdotal ob- 
servations) in a majority of the factors 
studied by means of direct observation. Only 
in the case of recitational activities (consist- 
ing primarily of pupil responses to teacher 
questions or directions during formal recita- 
tion) did the controls exhibit substantially 
higher average scores. (The values cited in 
the comparisons that follow are drawn mainly 
from Table XIV which shows results for five 
semesters and represents from 32 to 48 pairs 
of classes). 

_Both groups exhibit more recitational ac- 
‘vities than any other performance that was 
studied by means of direct observation (the 


EVALUATION OF THE ACTIVITY PROGRAM 


205 


averages for recitational activities range from 
37 to 76, while the averages for the other 
categories range from 27 to less than 2), but 
during the various semesters, the averages of 
the control group ranged from about 30 to 
about 90 percent higher than the average of 
the activity group. The differences were sta- 
tistically reliable in all comparisons except 
those based on results for the first semester 
of the study. The results show that the con- 
ventional type of recitation is emphasized con- 
siderably more in control than in activity 
classes. 

The activity classes exhibited from two to 
three times as many Self-initiated activities 
as did the controls. The differences were con- 
sistently reliable. 


The activity children exhibited about twice 
as many critical activities as did the controls. 
Although the differences between the aver- 
ages are relatively large, and consistently and 
impressively favor the activity groups, they 
fall short of complete statistical reliability in 
four of the five semesters (the critical ratios 
range from 2.17 to over 3). 


The average scores of the activity children 
in experimental activities are consistently and 
substantially higher than the control aver- 
ages (the averages of the former ranged from 
about 25 to over 80 percent higher than the 
averages of the latter). The differences were 
statistically reliable during the last three 
semesters of the study, but not during the 
first two semesters. 

During three of the five semesters, the 
activity children somewhat surpassed the con- 
trols in frequency of cooperative activities, 
but no comparison shows a statistically reli- 
able difference. The scores in ratings of qual- 
ity of cooperative activities (based on anec- 
dotal records) consistently favor the activity 
groups, and the difference in favor of the 
activity children is statisticalty reliable dur- 
ing three of the five semesters. Even during 
the two semesters when the activity classes 
lagged slightly behind the control classes in 
frequency of cooperative activities they still 
surpassed the controls in average ratings for 
quality (and in both instances this difference 
is statistically reliable). 

During all five semesters, the activity 
groups surpassed the control groups in fre- 
quency of leadership activities, and in four of 
the five comparisons the difference is statist- 
ically reliable. In most of the comparisons, 


206 


the differences are relatively large, but the 
absolute scores of both groups are relatively 
small (ranging from 1.44 to 3.47 in the activ- 
ity groups and from .32 to 3.18 in the con- 
trol groups). In the ratings of quality, the 
activity children consistently, and to a 
statistically reliable degree, show higher 
scores in leadership than do the controls. 


The averages in the category under which 
were tallied items of behavior representing 
negative work-spirit are low and inconclusive. 
To the extent that the data under this head- 
ing can be taken to represent the discipline 
maintained in the classroom, the results, as 
far as frequency of behavior is concerned, 
indicate that the control children exhibited 
somewhat better discipline during the first 
three of the five semesters, while the activity 
children showed better discipline than did the 
controls during the last two of the five 
semesters. In no case, however, is the differ- 
ence large, and the differences are quite un- 
reliable. As pointed out in earlier discussion, 
the direction of the differences between the 
two groups with respect to scores in this cate- 
gory does not seem to be as significant as is 
the fact that the difference, in each compar- 
ison, is small and negligible. The greater 
degree of freedom and self-direction afforded 
by the activity program has not been a:signal 
for poor discipline and disorder on the part 
of the pupils. Rather, according to the find- 
ings in connection with the “work-spirit” 
category, as well as according to the inde- 
pendent testimony of observers who have 
visited many classes and have gone to the 
same classes day after day, the pupils have 
risen to the occasion in a highly satisfactory 
way. Reports based upon informal observa- 
tion likewise give a favorable account of the 
manners and courtesy exhibited by the chil- 
dren in activity classes. 

As indicated by the statements above, the 
observational data demonstrate rather large 
differences in favor of the activity groups 
with respect to a number of presumably de- 
sirable forms of behavior. It may be recog- 
nized, of course, that differences of one sort 
or another would be expected from the very 
fact that the activity program was designedly 
different from the traditional program. With- 
out empirical data, however, it would not 
have been possible to determine just what 
turn these differences would take, or to know 
to what extent pupils would respond to the 


JOURNAL OF EXPERIMENTAL EDUCATION 


[ Vol. 8, No. 2 


opportunity to manifest the various forms 9; 
behavior that were studied by the method o/ 
direct observation. The data definitely show 
that as a result of the opportunities afforded 
in activity classes, the pupils, while partici- 
pating considerably less in formal recitation. 
have shown a decided increase in evidences 
of initiative and experimentation; have more 
often assumed the role of leadership; have 
participated more actively in criticism and 
appraisal of one another’s work without, at 
the same time, showing a corresponding de- 
cline in evidences of cooperation but, rather. 
while showing a higher quality of cooperation 
as measured by ratings of anecdotal accounts 
of classroom behavior. Further, while the 
children thus have responded to opportunities 
for participation and self-expression, and have 
availed themselves of the greater degree of 
freedom afforded by the activity program, 
they apparently have not abused their oppor- 
tunities, for the two groups were found to be 
substantially similar in tallies and ratings 
relating to classroom conduct and discipline. 

To determine the immediate values, or the 
ultimate benefits, of the various performances 
in which the activity children surpassed the 
controls would require more intensive study. 
More intensive investigation likewise would 
be required to determine the extent and 
nature of other learnings and performances 
that were not measured by the methods so 
far employed, and to explore the extent to 
which such learnings might offset, or out- 
weigh, the somewhat greater degree of com- 
petence that the control pupils meanwhile 
had gained in in academic subjects 
(notably arithmetic). To the extent, how- 
ever, that an increase in quality of coopera- 
tion, and in frequency of performances such 
as those included under the headings of ini- 
tiative, experimentation, critical activities and 
leadership may be regarded as valuable out- 
comes per se, the observational data indicate 
that important objectives of the activity pro- 
gram have been achieved to a large degree. 


BIBLIOGRAPHY 


1. McCall, William A. and Loftus, John J. 
“America’s Largest City Experiment with 
a Crucial Educational Problem.” Teach 
ers College Record, Vol. 38, pp. 602-606, 
April 1937. 

2. McCall, William A., Herring, John P. 
and Loftus, John J. “Measuring the 


December, 1939] 


Amount of Activity Education in Activ- 
ity and Control Schools in New York 
City.” Teachers College Record, Vol. 39, 
pp. 230-240, December 1937. 


_ McCall, William A., Herring, John P:. 


and Loftus, John J. “Measuring Achieve- 
ment in Activity and Control Schools in 
New York City.” Teachers College Rec- 
ord, Vol. 39, pp. 423-431, February 1938. 
_ Wrightstone, J. Wayne, Rechetnick, 
Joseph, McCall, William A. and Loftus, 
John J. “Measuring Intellectual and 
Dynamic Factors in Activity and Control 
Schools in New York City. Teachers 
College Record, Vol. 40, pp. 237-244, 
December 1938. 

. Wrightstone, J. Wayne, Rechetnick, 
Joseph, McCall, William A. and Loftus, 
John J. “Measuring Social Performance 
Factors in Activity and Control Schools 
in New York City.” Teachers College 
Record, Vol. 40, pp. 423-432, February 


1939. 
. Wrightstone, J. Wayne, Rechetnick, J., 
and Loftus, J. J., “Results of Compre- 


EVALUATION OF THE ACTIVITY PROGRAM 


Io. 


II. 


12. 


. Gates, 


207 


hensive Achievement Test in Controlled 
Experiment,” Teachers College Record, 


Vol. 39, 431-432, February, 1938. 


. McCall, William A. and Herring, John P. 


“Comprehensive Achievement Test.’’ 
Laidlaw Brothers, 1937. 

Arthur I., Mort, Paul R., 
Symonds, Percival M., and others. “The 
Modern School Achievement Tests.” 
Bureau of Publications, Teachers Col- 
lege, Columbia University, 1931. 


- McCall, William A. “Intelligence Test.” 


Laidlaw Brothers, New York, 1937. 
McCall, William A., Herring, John P., 
and Loftus, John J. “School Practices 
Questionnaire.” Laidlaw Brothers, New 
York, 1937. 

Jersild, A. T. and Associates. “A Study 
of the Elementary Classrooms in Action.” 
1939. Unpublished. 

Arrington, R. E. “Interrelations in the 
Behavior of Young Children.” Child 
Development Monographs, No. 8, Teach- 
ers College, Colurnbia University, Bureau 
of Publications, 1932. 


STUDIES IN VISUAL AND AUDITORY MEMORY SPAN WITH 
SPECIAL REFERENCE TO READING DISABILITY 


NicHotas D. R1zzo 
The Rivers School, Brookline, Mass. 


I. Introduction. Stated in the most general 
terms, the purpose of this investigation was 
to study three types of memory span,’ namely, 
tachistoscopic visual span, auditory span, and 
temporal visual span by means of three tests 
devised by the writer,? and to discover the 
relationship between measures of memory 
span and objective ratings of reading ability. 
Specifically, an attempt has been made to 
answer the following questions: 

1. How reliable are the three tests of 
memory span which were employed in this 
study and how do the reliabilities for the 
three tests compare? Since it was possible to 
score the test records in at least three ways, 
it seemed desirable to compute also the reli- 
ability coefficients for each scoring method 
investigated. 

Reliability coefficients for previous tests of 
memory span vary widely, owing to the fact 
that experimental procedures have differed 
greatly from one study to another. These 
variations have been especially pronounced 
for tests of auditory memory span, and it 
would seem that each investigator secured 
different figures. For several of the previous 
tests the following coefficients were obtained: 
Bolton (1931),* .28 for her nine-trial test of 
auditory memory span administered to 45 
college undergraduates; Peatman and Locke 

1 The definition of memory “~ employed in this study can 
be traced to the early work of Jacobs (1887) who devised a 
“memory span test to determine the maximal number of re- 
lated or unrelated elements which a testee can reproduce 
exactly (immediately) after a single presentation”. Warren 
(1930) makes a finer distinction in that he uses “logical 
memory”’ to designate the capacity to retain meaningful mate- 
rial as compared with the term “‘memory span’’ to indicate 
memory for meaningless material. Other terms employed by 
Warren to designate the latter function are ‘‘attention span” 
and “range of apprehension”. English (1934) uses the term 
“memor n’’. Crosland (1928) writes of “range of atten- 
tion’; Tinker (1929) speaks of “‘visual apprehension’’. It is 
evident from an examination of what previous writers actually 
meant that there is agreement on what memory span is 
despite the fact that the terminology varies. For the purpose 
of the present investigation, memory span as the 
number of unrelated letters which can be reproduced imme- 
diately in writing after a single presentation. 

? The writer wishes to express his indebtedness to the fol- 
lowing persons for valuable advice and help in the construc- 
tion of the memory span tests: Mr. James Brewster, Harvard 
Film Service: Professor Irving H. Anderson, University of 
Michigan: and Professor Walter F. Dearborn, Harvard 
University. 

* Citations refer to references in the bibliography. which 
are arranged in alphabetical order under the author's names. 


(1934), -87; Davis (1932), .74; Hao (1924), 
.52; Gates (1916), .62; Garrett (1928), 8: 
Mitchell (1919), .47; and Abelson (ror). 
-73, for his test of memory span in which the 
subjects both heard and saw digits simultane. 
ously. Considering the fact that Abelson’s 
subjects ranged from 8 to 15 years in chrono- 
logical age, his retesting coefficient would not 
seem to indicate a highly reliable test. 


The reliability coefficients reported for 
tests of visual memory span are, in general, 
somewhat larger than the coefficients reported 
for auditory tests. An adaptation of the audi- 
tory test of Davis for visual presentation 
yielded a coefficient of .84. Hao’s visual digit 
span test possessed a reliability of .83. Burt 
(1909) found a coefficient of .93 for a test of 
visual span administered to high school pupils. 
Garrett’s (1928) test of visual memory span 
administered to 200 college students had a 
reliability of .68. The visual tests to which 
reference has been made were not tachisto- 
scopically presented. And credit was given 
only to reports in which the original order of 
the stimulus units was presented. 

In a tachistoscopic visual test administered 
by Crosland and Johnson (1928) to 30 adults 
the reliability coefficients ranged from .73 to 
.96. The reliabilities reported by Tinker 
(1932) were based on tests of visual span 
administered to 300 college students. They 
ranged from .87 to .95, the majority of them 
being in the neighborhood of .oo. Tinker 
stated that a scoring method which credited 
both correctly and incorrectly placed items 
was the most reliable. 

In summary, from an examination of the 
literature, it is difficult to state just how reli- 
able separate tests of memory span are, no! 
to speak of their comparative reliability, be 
cause the conditions under which the various 
experiments have been conducted were either 
poorly controlled or inadequately standaré- 
ized. The methods of presentation and sco 
ing vary widely in the studies reviewed. 
chronological age ranges of the subjects em 
ployed were not comparable. No attempt has 


208 


December, 1939] 


been made to discover whether the age of the 
subjects has any influence on reliability. In 
some studies the writers said little or nothing 
about scoring method, assuming that there 
could be no doubt concerning the identity of 
the method employed. The implications of 
these observations‘ for the present study are 
clear. Similar tests of memory span should 
be administered to children at various grade 
levels under carefully controlled conditions. 
Reliabilities for three methods of scoring each 
test will be computed at each grade level. 

2. What are the relationships between 
methods of scoring the same response to a 
memory span test? In the event that the three 
methods of scoring the same test are found to 
be equally reliable, it will be important to 
know to what extent scores obtained for 
each method are essentially measures of a 
common ability. 

The literature on memory span has not 
offered a great amount of evidence on the 
problem of the relationship between various 
methods of scoring the same _ response. 
Tinker’s (1932) study is the only one in 
which an attempt was made to determine the 
relationship between different scoring meth- 
ods. He found a high correlation (.87) be- 
tween a method which credited only correctly 
placed responses and another method which 
credited responses regardless of placement. 


3. What are the interrelationships between 
the three tests of memory span here investi- 
gated? The analysis to be made in this con- 
nection should throw light on the problem of 
whether there is a general memory span abil- 
ity, or whether memory spans vary independ- 
ently. If the intercorrelations between com- 
parable scores for different tests are uniformly 
high, it may be argued that there is a general- 
ized memory span ability. If, on the other 
hand, the intercorrelations are uniformly low, 
it will be necessary to postulate the existence 
of separate functions. Perhaps it will be 
found that, whereas the correlations between 
the two visual tests are high, the correlations 
between each of these and the Auditory Test 
are low. Or it may be found that the correla- 
tions between the two tests employing suc- 
cessive stimulation may be high, while those 
between each of these tests and the Tachisto- 
scopic Visual Test are low. In such a case, 
mode of presentation rather than sense field 

‘For a summary treatment of the limitations of previous 


memory span 
of Blankenship 7, 9 oe thé reader is referred to the study 


VISUAL AND AUDITORY MEMORY SPAN 


209 


is the important determinant of whatever 
relationships are found. 

Several attempts have been made to dis- 
cover the relationship between auditory and 
visual span. Hao (1924) has reported a cor- 
relation coefficient of .39 between the audi- 
tory and visual digit spans of 83 sixth-grade 
Chinese children. Gates (1916) has reported 
a coefficient of .62 between the auditory and 
visual forms of the same test of digit memory 
administered to 165 college students. Bennett 
(1916) reported a correlation of .51 between 
visual and auditory spans for nonsense syl- 
lables, and a correlation of .62 between the 
two spans when digits were used. Bennett’s 
subjects were nine college students. Hump- 
stone (1919) has secured coefficients of .52 
between auditory memory for digits and 
memory for unrelated words, and .37 between 
digits and sentences, using 565 college men 
and women as subjects. These data of Hump- 
stone indicate that even auditory memory 
span is not a single ability when different 
types of material are employed. 

The problem of the present study is well 
defined when we consider how incompletely 
the question of the relationship between spans 
has been treated. In the studies where tests 
of temporal visual and auditory spans were 
employed, there is some question as to 
whether sufficient care was taken to insure 
that the stimulus series in the two tests were 
presented in a comparable manner, especially 
with reference to the interval between single 
items and the time for which the items within 
a series were seen or heard. Since no previous 
study reviewed here has included a tachisto- 
scopic test, no data on the relationship be- 
tween scores on a tachistoscopic test and 
scores on temporal visual and auditory tests 
are available. There has been no attempt to 
state in objective terms whether the relation- 
ship between auditory and visual spans in- 
creased or decreased with increased age. It 
is not possible to evaluate fairly the correla- 
tions cited above because the types of mate- 
rial used varied over such a wide range. In 
this research an analysis will be made in 
which the variables will be based on similar 
methods of scoring three different tests of 
memory span composed of the same types of 
materials. In the two tests where the stimulus 
units were presented serially, the interval be- 
tween letters was constant. The correlations 
will be computed in each of the eight grade 
levels included in the research. 


210 


4. What effect do chronological age, mental 
age, and grade placement have on the size 
of memory span scores? Are there any differ- 
ences in function of mode of presenting the 
tests, or of sense field involved? Does in- 
creased age affect these differences? 

The results of several previous investiga- 
tions are valuable from the point of view of 
the application of memory span tests as 
measures of certain phases of mental develop- 
ment. The results of Starr (1924), McCaul- 
ley (1923), Terman (1916, 1917), Smedley 
(1902), and Hao (1924) would seem to in- 
dicate that successive chronological age 
groups of public school children are differen- 
tiated om the basis of memory span scores. 
From the figures in Table XI it appears that 
growth in memory span continues with some 
regularity until the sixteenth year, but it is 
by no means certain that a limit in memory 
span development is reached at sixteen years 
since the subjects investigated were not 
always representative of the age groups 
studied.* 

5. Are letters reported with greater fre- 
quency at certain positions than at others 
within the stimulus series? It will be shown 
later that the results of the analysis which 
was made to answer this question have valu- 
able implications relative to the way in- which 
memory span might be related to reading 
ability. 

The studies of Crosland and Johnson 
(1928) and Crosland (1939) showed that 
there is a progressive decrease from left to 
right in the recall value of letters within a 
series presented tachistoscopically. The first 
study employed some 30 adult subjects and 
the later one dealt with a highly selective 
group of third-grade children. In the present 
research the comparative letter position scores 
from each of three different tests of memory 
span are analyzed. These data should give 
some indication as to the process of growth 
in the memory span functions involved in the 
three tests. Specifically, is the tendency for 
the subjects to report with greatest frequency 
the letters appearing at certain positions the 
same, or different, for the three tests? Can 
growth in the functions measured be described 
in terms of a characteristic change in the 
forms of the letter position curves from 
grade to grade? 


* For example, Hao had seventeen and eighteen year old 
children classified as of eighth ranking. It is uncom- 


mon to find children of this age in the eighth grade. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No, > 


6. Finally, what direct relationships exis, 
between memory span scores and scores op 
standardized reading tests? Both the correla. 
tion method and the technique of comparing 
mean differences of extreme groups will be 
employed. It is a well known fact that tests 
of memory span have been used to measure 
mental ability for a relatively long period of 
time. As early as 1887 Jacobs administered 
his “memory span test” to English schoo! 
children. Galton (1897) demonstrated that 
the “prehension of idiots” was definitely in- 
ferior. Bolton’s (1892) early study of audi- 
tory memory span showed that it increased 
with chronological age rather than with 
growth in school achievement. Bond and 
Dearborn (1917) employed an auditory test 
in their study of the abilities of blind sub- 
jects. It is of interest to ask at this point 
what the possibilities are of using a temporal 
visual test of memory span to measure the 
mental ability of deaf subjects, to whom it is 
obviously impossible to administer an audi- 
tory test. As will be shown in a later section, 
the present study offers new evidence relative 
to the uses which might be made of the 
temporal visual test of memory span as a 
measure of mental ability, not only of deaf 
persons but of normal individuals as well. 
Jones (1925) has offered conclusive evidence 
that the visual and auditory memory spans 
of gifted children are definitely superior. 
Brotemarkle (1924) has reported some evi- 
dence which indicates that memory span 
scores separate the most capable from the 
least capable in mental ability. He based his 
conclusions on the results of testing some 
1200 college men and women. 

The studies of Burt (1909) and Abelson 
(1911) point to a relationship between mem- 
ory and teachers’ estimates of pupils’ abil- 
ities, although their memory tests were in 
reality carefully controlled learning exercises 
and not actually memory span tests. 

The problem of the relationship between 
various measures of memory span and read- 
ing ability appears to have been largely neg- 
lected. It is commonly assumed that two Im- 
portant causes of disability in reading are 
limited visual and auditory span, but adequate 
experimental proof for this belief is lacking. 
The few studies which have been conducted 
included correlations at a given grade level 
and conclusions were based on the results of 
testing that one group alone. The present !- 
vestigation has included not only correlations 


December, 1939] 


between memory span and rate and compre- 
hension in reading ability at each of eight 
grade levels, but also a comparison will be 
made of extreme groups separated first, on 
the basis of reading ability and, secondly, on 
the basis of memory span scores. 

II. Experimental Procedure: A. Subjects. 
The experimental group of this study con- 
sisted of 310 pupils selected from the public 
schools of Meriden, Connecticut. In each of 
grades two, three, four, five, six, and eight 
one heterogeneous homeroom division was 
chosen. The first five grade groups were from 
the Samuel Huntington Elementary School; 
the eighth grade was chosen from the Lincoln 
Junior High School. From an alphabetical 
listing of the entire tenth and twelfth grade 
enrollment, the high school subjects were 
chosen at random. Since there were approx- 
imately 400 pupils in each class, every fourth 
name was checked as a subject. Descriptive 
data on the mental and reading ability of the 
subjects in each grade are given in paragraph 
C. The grade and sex distributions of the 
experimental population are presented in 
Table I. 

B. Laboratory Tests of Memory Span. 
Three laboratory tests of memory span were 
administered to the experimental group in 
their regular classrooms. Only one laboratory 
test per day was given to any one group, ex- 
cept in the case of grades ten and twelve, 
where the three laboratory tests were admin- 
istered on the same day, sufficient time being 
given for rest between tests. The order of 
presenting the memory span tests was rotated 
so that practice effects would be minimized. 
In grades two, three, and four the order was: 
Test I, Test II, Test III; in grades five, six, 
and eight: Test II, Test III, Test I; in 
omy ten and twelve: Test III, Test I, 

est If. 


TABLE I 
DISTRIBUTION OF SUBJECTS BY GRADE AND SEX 
No. of No. of 
Grade Boys Girls Total 
22 14 36 
eas 20 22 42 
23 17 40 
15 29 44 
ree 16 27 43 
12 17 29 
18 20 38 
17 21 38 
143 167 310 


VISUAL AND AUDITORY MEMORY SPAN 


211 


The time required to administer each mem- 
ory span test was between 30 and 40 minutes. 
In the five lowest grades, the subjects were 
taken in groups of 20 to 25. This necessitated 
two presentations for each test in the lower 
grades. A period of instruction and practice 
preceded the test proper, consisting of ten 
practice trials and 20 test trials. Responses 
for both practice and test trials were recorded 
on specially prepared blanks, but only the 20 
test trials were considered in the scoring. 

Description and Method of Administering 
Tests. (1) Test I we shall designate as the 
Tachistoscopic Visual Test. It consisted of a 
motion picture film on which had been photo- 
graphed 30 exposure cards. On each card 
there were nine letters of the alphabet ar- 
ranged in nonsense order. Each stimulus unit 
of this test was presented by means of a 
16 mm. projector, with adjustable speed con- 
trol, directed upon a screen placed on a desk 
in the forepart of the classroom. The dura- 
tion of each exposure was approximately one- 
tenth of a second, thus eliminating all possi- 
bilities of eye movements during the period 
of the exposure. 

After the subjects had written their names 
on the test blanks provided, they were in- 
structed for the Tachistoscopic Visual Test 
in the following way: they were told that 
series of letters would appear on the screen 
for a very brief time. Immediately preceding 
each exposure the subjects’ eyes were di- 
rected to a fixation point, which consisted of 
an X placed in a position where the middle 
letter of each stimulus unit subsequently ap- 
peared. They were encouraged to catch as 
many of the nine letters as possible, and to 
report them in writing on the spaces of the 
blank which corresponded to the position 
which the remembered letters occupied within 
each series as they appeared on the screen. 
Two additional points were made. It was per- 
missible to omit spaces between letters re- 
ported; and all letters seen should be reported, 
even when they could not be placed in the 
correct order. All instructions were carefully 
explained and illustrated as far as possible on 
the blackboard during the practice period. 

(2) Test II, the Auditory Test, consisted 
of another series of 30 nine-letter nonsense 
words recorded for phonographic presentation. 
A trained speaker was employed for the 
recording. The Auditory Test was presented 
by means of a portable electrical phonograph, 
with a constant speed of 78 revolutions per 


212 


minute, attached to a radio, which made pos- 
sible the optimum volume necessary for the 
sound to reach subjects in all parts of the 
classroom. The interval between two succes- 
sive letters of any one series was approxi- 
mately one second, the total time per series 
being 17 seconds. Ample time was given after 
each group of letters to permit all subjects to 
record their responses. 

The following points were emphasized in 
the instructions: The subjects were told that 
groups of letters would be heard, and that 
there was to be no writing while the phono- 
graph was playing. At the end of each series 
of letters they were instructed to report in 
writing as many letters as they could remem- 
ber on the spaces of the blanks which corre- 
sponded to the positions that the remembered 
letters occupied within each series as it was 
heard. Again, it was stated that it was per- 
missible to omit spaces if the letters could not 
be reproduced consecutively, and that all 
letters remembered should be reported, even 
though they could not recall the correct order. 

(3) Test III, the Temporal Visual Test, 
was composed of still another series of 30 
nine-letter nonsense words photographed on 
motion picture film in such a way that when 
the film was shown to the subjects, they saw 
the letters of each series singly. The- letters 
of each series were separated by intervals of 
approximately one second, the total time per 
series being 17 seconds. 

The instructions were similar to those pre- 
ceding the Auditory Test. There was to be no 
writing while the film was being run. Letters 
were to be reported by correct position as in 
the case of the two previous tests. It was 
again explained that spaces might be omitted 
in the report, -if intervening letters could not 
be recalled, that is, in some cases the first 
and last letters might be reported in the first 
and last spaces, respectively, with no letters 
appearing in the intermediate positions. The 
subjects were also instructed to report all the 
letters they could remember even when the 
correct order could not be preserved. A 
“ready” signal prepared the subjects for the 
first letter of each series of nine letters, in the 
case of both the Auditory and Temporal 
Visual Span Tests. 

Methods of Scoring. Three measures of 
memory span were computed for each test. 

1. A gross span score (GS), defined in each 
test as the number of letters reported which 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No, 2 


were included in the particular nonsense word 
presented. 

2. An accuracy of placement score (AP). 
defined as the number of letters in the gros: 
span score which were identified correctly 
with respect to position. : 

3. A range score (R), defined as the 
widest range over which letters were correctly 
reported. An example will serve to illustrate 
the scoring technique employed. Let yx 
assume that 


agrftlonpsw*m 


were the nonsense word presented by means 
of either the projector, or phonograph, and 
the subject caught 


ag ] ps m 
the report might be 

8 


In the above trial the gross span score would 
be six, because all of the six letters reported 
occurred in the stimulus unit presented. The 
accuracy of placement score would be three, 
because only “a”, “s’, and “m’”’ of the six 
letters reported by the subject were identified 
correctly by position. The range score would 
be nine, because nine is the widest range over 
which letters were correctly placed in the 
report. 

Twenty trials, each trial constituting a 
stimulus unit of nine letters, were scored. 
Three scores were recorded for each ‘est trial. 
This was repeated for the twenty trials and 
averages were calculated. The hig‘iest average 
score possible according to any method 0! 
scoring was nine. 

C. Reading and Intelligence Tests. Suit- 
able tests of silent reading ability and general 
intelligence were administered to all the sub- 
jects. The titles of these tests are listed in 
Table II. In grades two, three, four, five, sis, 
and eight the reading grade equivalents of the 
silent reading comprehension raw scores wert 
used. They were converted into \Vew 5/0" 
ford Reading equivalents, and are reporte! 
in Table III. The reading comprehension 
scores for the subjects in grades ten an 
twelve are given in terms of Jows Sven 
Reading Comprehension equivalents and 4p- 
pear in Table IV. In converting scores 0! one 
distribution into equivalents of another, the 


December, 1939] 


VISUAL AND AUDITORY MEMORY SPAN 


213 


TABLE II 
READING AND INTELLIGENCE EXAMINATIONS ADMINISTERED TO SUBJECTS 


Grades Reading Tests Intelligence Tests 
YS New Stanford Reading, Form W Kuhlmann—Anderson (Grade 2) 
Metropolitan Primary Reading, Form A Haggerty Delta 1 
New Stanford Reading, Form W Kuhlmann—Anderson (Grade 3) 
Metropolitan Primary Reading, Form A Haggerty Delta 1 
Shank Tests of Silent Reading Dearborn C 
Comprehension, I, Form A 
New Stanford Reading, Form W Kuhlmann—Anderson (Grades 4, 
Metropolitan Reading, Adv., Form B 5, 6 respectively) 
Shank Tests of Silent Reading Dearborn C 
Comprehension, I, Form A Otis Self-Administering, 
Inter. Form A 
§ __________. New Stanford Reading, Form W Kuhlmann—Anderson (Grade 8) 


Metropolitan Reading, Adv., Form B 
Shank Tests of Silent Reading 
Comprehension, II, Form A 
1 Shank Tests of Silent Reading 
Comprehension, III, Form A 

Monroe Silent Reading, III, Form A 

Iowa Silent Reading, Adv., Form A 


TABLE III 


MEANS AND STANDARD DEVIATIONS OF CoM- 
posite NEw STANFoRD READING W 
READING GRADE SCORES OF SUBJECTS 
IN GRADES Two TO EIGHT 


Grade 2 3 4 5 6 8 
_—— 36 42 40 44 43 29 
2.98 3.48 4.60 6.01 6.88 


49 1.03 1.54 1.67 


TABLE IV 


MEANS AND STANDARD DEVIATIONS OF CoM- 
POSITE IOWA SILENT READING COMPRE- 
HENSION RAW SCORES OF SUBJECTS 
IN GRADES TEN AND TWELVE 


Grade 10 12 
M 96.50 150.58 


method suggested by Hull 
employed.* 

The grade distributions of the reading 
comprehension scores are more or less typical 
of what has generally been found. Variability 
in performance, as measured by the standard 
deviation of the New Stanford Reading equiv- 
alent scores, tends to increase from grade to 
grade, in this case reaching its highest point 
at grade six. The distributions for grades ten 

‘The technique is as follows: M—Mean of original series; 

original series; M’ 


of original series; X—score 
— new series; ¢’—SD of new series; X’—Score to be 


(1928) was 


xX’=K+SX 


Dearborn C 
Otis Self-Administering, 
Inter. Form A 


Kuhlmann—Anderson (IX to 
Maturity) 

Otis Self-Administering, 
Higher, Form A 

Terman Group Tests, Form A . 


and twelve, where composite Jowa Silent 
Reading scores were used, show considerable 
variability also, the variability being greater 
in grade twelve. 

The speed of silent reading scores reported 
in Table V correspond to the number of exer- 
cises attempted on the New Stanford Reading 
and on the Metropolitan Reading Test. Here 
again, considerable variability in test scores is 
to be observed; the greatest variability being 
in grade five. In the two upper grades a com- 


TABLE V 


MEANS AND STANDARD DEVIATIONS OF SILENT 
READING SPEED* SCORES OF SUBJECTS 
IN GRADES Two TO EIGHT 


Grade 2 3 4 5 6 8 
N.. 36 42 40 44 43 29 
M__38.72 52.26 74.50 111.95 126.91 137.76 
SD_11.22 17.01 25.48 27.09 23.02 21.19 


* Number of Exercises attempted in New 
Stanford Reading W, and Metropolitan Read- 
ing Adv. B. 


TABLE VI 


MEANS AND STANDARD DEVIATIONS OF SILENT 
READING SPEED SCORES* OF SUBJECTS 
IN GRADES TEN AND TWELVE 


Grade 10 12 
88 88 
M 59.08 67.92 
SD 11.08 8.52 


* Sum of Iowa S Score and Iowa Equiv- 


alent of Monroe Silent Reading Speed Score. 


214 JOURNAL OF EXPERIMENTAL EDUCATION 


[ Vol. 8, No. 2 


TABLE VII 


MEANS AND STANDARD DEVIATIONS OF COMPOSITE KUHLMANN-ANDERSON I. Q.’s For 
319 SUBJECTS GROUPED BY GRADE 


Grade 2 3 


9.70 


posite of the speed scores from the Jowa Silent 
Reading Test and the Monroe Silent Reading 
Test was used (Table VI). Variability in this 
case was more pronounced in grade ten than 
in grade twelve. 

The intelligence quotients given in Table 
VII are composites of quotients based on the 
mental tests given in each grade and are ex- 
pressed as Kuhlmann—Anderson equivalents. 
An inspection of this table will show that 
except for the somewhat greater variability 
of the scores in grade four and the lower mean 
score in grade eight, there were, in general, 
no conspicuous differences from grade to 
grade in mean intelligence quotient or in the 
variability of this measure. 

The reading and intelligence examinations 
were administered during the months of 
October, November, and December of 1937. 
All tests were administered during the regular 
school day. Care was exercised in spacing the 
tests so that the subjects would: not be 
fatigued by too long a period of testing, and 
the testing situations were kept as natural and 
school-like as possible. 


III. Results: A. Reliability of the Memory 
Span Tests. The reliability coefficients re- 
ported in Table VIII were computed by the 
split-halves technique, scores on the even- 
numbered trials being correlated with scores 
on the odd-numbered trials. The reliabilities 
for the three methods of scoring each test 
were determined at each of the eight grade 
levels investigated. Considering only the fig- 
ures which have been corrected for the full 
length of the test, these coefficients were 
rather high, ranging from .645 to .973. 
Thirty-one percent of these 72 correlations 
are above .goo; 76 percent are above .800. 

An inspection of the reliability coefficients 
reveals that they differ from grade to grade, 
by method of scoring, and from test to test. 
As a battery, the test yielded the highest reli- 
ability coefficients for grade two, where they 
range from .854 to .973. At this grade level, 
seven of the nine coefficients are above .goo. 
The coefficients are lowest for grade six where 


4 5 6 8 0 


40 44 43 29 38 38 
100.73 103.80 105.00 94.23 100.31 110); 
16.87 10.18 11.64 12.13 13.62 147 


they vary from .645 to .852. At the other 
grade levels the sizes of the reliability ¢o. 
efficients vary more or less irregularly. 


It is gratifying to find that the reliability 
of these tests is highest at the second grade 
level because, as will be shown in a later para. 
graph, it is in this grade that the diagnostic 
value of the test battery may prove of greatest 
worth for clinical use. 


It is pertinent here to compare the results 
of this study with the data obtained by 
Tinker (1932) in a study in which coefficients 
of reliability were computed for three methods 
of scoring his test records. This investigator 
found that a system of scoring which credited 
responses regardless of conformity to the 
order of the stimulus pattern is the most reli- 
able. This method of scoring (gross span 
scores) was found to be most reliable in the 
present study also. Forty-two percent of the 
gross span scores yielded reliability coefi- 
cients above .goo, as compared with only 25 
percent of the coefficients for the accuracy of 
placement and range scores which reached or 
exceeded .goo. It should be stated, however, 
that the range scores are just as reliable as 
the gross span scores in the Temporal Visual 
Test. Accuracy of placement scores tended to 
be more reliable in the two visual tests than 
in the Auditory Test. 

An inspection of Table VIII will reveal 
that the Temporal Visual Test, taken as 4 
whole, is the most reliable, followed in order 
by the Tachistoscopic Visual Test and the 
Auditory Test. 

The significance of test reliability is wel 
understood. Unless a test is highly reliable, 
it cannot be used with confidence as a test for 
individual diagnosis of the function measured. 
Kelley (1927) has stated that for individual 
use, the reliability coefficient of a test should 
be at least as high as .94, but for group 
measurement a test with a reliability as lov 
as .50 may have some value. As is shown by 
the raw coefficients in Table VIII, our te 
could be reduced to half its present length 


December, 1939] 


VISUAL AND AUDITORY MEMORY SPAN 


215 


TABLE VIII 
RELIABILITY* COEFFICIENTS OF LABORATORY TESTS 


I.Tachistoscopic Visual Test 
aS R 


II, Auditory Test 


III. Temporal Visual Test 
Gs 


srede Scores Scores Scores Scores Scores Scores Scores Scores Scores 
2 2856 e801 2745 e871 2852 2869 28355 2902 2948 
(se) 9324920 4930 4908) 948.073 
3 695 2537 2749 669 2701 2843 2850 871 
(N=42) 2901 2820 2699 2856 2802 8240 2919 , 951 
4 2740 2745 2652 2837 2653 «670 e761 e777 e715 
(Ne40) 2854 2774 2911 2790 802.864 2875 
5 2793 863 e733 2695 2747 «659 2849 2824 2861 
(¥=44) 2885 2926 2846 2820 2855 3794 918 2904 925 
é 2742 2584 0476 709 e722 2646 2697 2634 
(R43) 852 737 2645 830 2748 2776 
8 2820 2602 «759 «577 «556 2784 815 
(N29) 794 2901 2752 2865 2752 5715 898 
10 2823 «750 2734 2744 2699 2814 808 «850 
(¥-38) 2857 2914 2847 53,823 2894 919 
12 «850 2694 708 «620 2656 2537 2844 
(N-38) 919 2819 2765 2792 2699 2915 2775 


* Underlined coefficients result from stepping up raw coefficients according to Spearman— 
Brown “prophecy formula”. See T. L. Kelley, Statistical Method, p. 206. 


and still be a usable test for group measure- 
ment. 

_ The data on the reliability of the memory 
span tests used in this study may be summar- 
ized as follows: these tests were found to be 
' as reliable as tests employed in previous in- 
vestigations, indeed if not more reliable. These 
rather encouraging results are explained at 
least in part by the greater standardization of 
procedure made possible by the use of the 
film and photographic techniques. The mem- 
ory span tests are most reliable for grade two 
and least reliable for grade six. The gross 
span scores were consistently the most reli- 
able. The test, regardless of method of scor- 
ing, which possessed the highest reliability 
coefficients, was the Temporal Visual Test. 

_ B. Relationship between Methods of Scor- 
ing the Same Test of Memory Span. In Table 
IX are reported three coefficients of intercor- 
relations between the three methods of scor- 
ing each test of memory span at the eight 
grade levels investigated. The differences in 
the sizes of the coefficients are considered (1) 
by memory span test and (2) by grade. 

The correlations between the methods of 
scoring the Tachistoscopic Visual Test range 
from .17 to .95. The median coefficient is .67. 
The corresponding figures for the Auditory 


Test do not differ markedly from those of the 
Tachistoscopic Visual Test. They range from 
.29 to .94, the median coefficient being ap- 
proximately .70. The Temporal Visual Test 
yielded the highest intercorrelations between 
the three methods of scoring. The coefficients 
range from .53 to .go, the median being .72. 


The highest coefficients in Table IX were 
obtained when accuracy of placement and 
range scores were correlated. This finding 
holds true for all tests. Of the 24 correlations 
between these measures, 71 percent are above 
.80, whereas only 25 percent and 8 percent of 
the same number of correlations between 
gross span and accuracy of placement scores, 
and those between gross span and range 
scores, respectively, equal or exceed .80. 


In summary, there were no large differences 
in the correlations between the three methods 
of scoring from test to test. Whatever differ- 
ences existed favored the Temporal Visual 
Test. No consistent differences in the correla- 
tions by grade were obtained. The two meth- 
ods of scoring which were consistently most 
highly correlated were the accuracy of place- 
ment and range scoring methods. This is 
attributable to the fact that range scores were 
always at least as large as accuracy of place- 


216 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. 8, No, > 
TABLE IX 
INTERCORRELATIONS AMONG THE THREE MEASURES OF MEMORY SPAN 
WITHIN EACH LABORATORY TEST* 

I. Tachistoscopic Visual Test II. Auditory Test III.Temporal Visual Tes 

Grade Scores Scores Soores Scores Scores Scores Scores Scores Scores 
2 2854.03 .92¢.02 2614.07 .592.07 2942.01 542.08 2534.08 
N-36 
3 ~624.06 0722.05 .862.05 2634.06 .462.08 .844.03 2814.04 .61+4.06 2752.04 
(N-42) 
) 484.08 2582.07 .914.02 294.09 2434.08 .814.04 2764.04 .614.06 0743.8 
5 -48+.08 2674.06 2652.06 .49%.08 070 2005 4624.06 .864,@ 
(N-44) 
6 562.04 2649.04 .90%.02 2804.04 .67%.06 .854.05 2584.07 .56+.07 784.6 
(N-43) 
3 -63%.08 .75%.06 .954.01 .29¢.11 4764.05 .612.08 .564.12 .80+,% 
(N-29 
10 ) «434.09 017#.10 .354.10 «89 402 2854.05 2844.05 2809.04 2624.07 839.8 
(N-38 
12 2664.06 4704.06 .624.07 2854.05 4684.06 .764.05 .902.02 .694,06 
(N-38 


* The abbreviations GS, AP, and R stand for Gross 


Range, respectively. 


ment scores, which means that the latter were 
conditioned by the former. 


One further statement should be made rela- 
tive to the findings discussed in the first two 
sections of this division. It has been found 
that gross span scores are the most reliable, 
but they cannot be used to the exclusion of 
the other scores. Since the intercorrelations 
between methods of scoring the same test are 
not uniformly high, it appears that even when 
a given test is scored in several ways, some- 
what different functions are measured, espe- 
cially when either accuracy of placement or 
range scores are correlated with gross span 
scores. 


Perhaps some method of scoring which 
takes into account both accurately and in- 
accurately placed items would give the most 
representative measures of memory span. 
Tinker’s (1932) method of scoring, in which 
he gives a weight of one to accurately placed 
items and a weight of one-half to inaccurately 
placed items would meet this condition. His 
finding, that such a method of scoring is not 
only the most reliable but is also highly cor- 
related (in the neighborhood of .90) with 
other methods of scoring, does not provide a 
conclusive answer to the present problem, 


Span, Accuracy of Placement, and 


since, as has been remarked above, his test 
was neither tachistoscopic nor temporal but a 
combination of both. 


C. Relationship Between Visual and Audi- 
tory Spans. In the above section the inter- 
correlations between the three methods of 
scoring each test were presented. We now 
proceed to the problem of the relationship 
which exists between span scores from test to 
test, when the method of scoring is constant. 
In order to attack this problem, the inter- 
correlations between comparable scores on the 
three tests have been computed, and are listed 
in Table X. Although it has been customary 
to evaluate correlation coefficients in terms of 
their probable errors, for the purposes of the 
present research such a procedure would be 
inadequate because, even though each indi- 
vidual coefficient of an array may not 10 
itself be significant, a group of such correla: 
tions may be significant (Fisher, 1934). The 
z-score technique of Fisher, which permits 
the arithmetical manipulation of coefficients, 
is employed in summarizing the data where 
such a recapitulation cannot otherwise b 
made. The z-value equivalents may be * 
cured directly from Fisher’s tables; the 
standard error of z, independent of the corte 


December, 1939] 


VISUAL AND AUDITORY MEMORY SPAN 


217 


TABLE X 
INTERCORRELATION AMONG SIMILAR MEASURES SECURED FROM THE THREE LABORATORY TESTS* 


Gross Span Scores Accuracy of Placement Scores Ran Scores 


Grade 

(f22.19) 2 33 «627 037 22 227 227 e32 219 226 

r 242 ell 259 44 28 247 240 

2 245 oll e41 229 51.39 242 

4 Tr 44 256 2354 221 229 259 

(@-.19 47 258 235 240 e21 222 250 224 259 209 250 351 

\ 

5 r 46 .24 249 255 238 227 224 

(22.16) 2 224 054 40 -28 15 024 422 

6 Tr 257 266 222 270 204 67 

228 257 242 -.12 ese 49 

(22-20) z +29 65 45 09 229 245 -.12 255 54 255 52 

10 r 18 252 222 253 e22 205 203 226 

z 18 255 226 222 222 226 205 12 

12 r 253 254 246 05 015 e22 -.13 219 

(22.17) 2 «34 050 .40 .05 25 222-413 019 409 
nz 028 249 58 223 027 222 15 240 225 


*I, Il, and III represent Tachistoscopic Visual, Audito 


, and Temporal Visual Tests, re- 


spectively. Vertical averages are underlined; mz, mean z value; Mz, mean z value for entire 


horizontal array. 


lation coefficient, is expressed in the formula: 
¢Z= —=>>—, where N’ is the number of 
V N 


paired observations. For the purposes of the 
present analysis, we shall assume that any z 
values which exceed three times their respec- 
tive standard errors are statistically signifi- 
cant. An inspection of the coefficients in 
Table X will immediately reveal that they 
are, on the whole, low. Only 9 of the 72 cor- 
relations, or 1234 percent of them, are large 
enough to meet the criterion set forth above. 
However, it cannot be concluded that the 
coefficients taken as a whole are not signifi- 
cant; it should be stated that there is evidence 
of a significant correlation at a low degree of 
relationship. In other words, the correlations, 
although low, cannot be explained as chance 
deviations. 

In so far as these figures permit, a three- 
fold analysis of these coefficients will now be 
made in order to answer the following ques- 
tions: (1) What is the influence of method 
of scoring on the relationships between visual 
and auditory memory spans? (2) What are 
the relative effects of similarity in method of 
presentation, and comparability of sense 


modality? (3) Do the correlations vary con- 
sistently as a function of grade level? 

(1) It will be recalled that differences in 
reliability’ were explained in part by method 
of scoring. An inspection of Table X reveals 
that, for the purposes of considering the re- 
lationships between tests of visual and audi- 
tory spans, method of scoring must be taken 
into account as a contributing factor. Since 
gross span scores were the most reliable, the 
results secured from this measure are consid- 
ered first. It was found that the relationships 
between various spans, when expressed in 
terms of the correlation between gross span 
scores for the two tests, are higher than the 
correlations between accuracy of placement 
scores, or range scores. The mean z value for 
the 24 correlations calculated between gross 
span scores for pairs of spans at the eight 
grade levels was .38, as compared with .31 
for the accuracy of placement scores and with 
.25 for the range scores. Four of the coeffi- 
cients in the array calculated for the gross 
span scores possessed equivalent z values 
which exceeded three times their respective 
standard errors, these correlations ranging 

"Cf. Table VIII. 


218 JOURNAL OF EXPERIMENTAL EDUCATION 


from .46, between the Tachistoscopic and 
Auditory Tests in grade five, to .66, between 
the Auditory and Temporal Visual Tests in 
grade six. The two individually significant 
correlations based on accuracy of placement 
scores occurred between the two serial tests. 
In grade six, the coefficient was .70, as com- 
pared with .50 in grade five. When the cor- 
relations between visual and auditory span 
were computed on the basis of range scores, 
two significant coefficients were secured; 
these were .47, between the Tachistoscopic 
and Auditory Tests in grade three, and .67, 
between the two serial tests in grade six. 


In the one case it was possible to make a 
comparison between the present results and 
those reported by Hao (1924) who found a 
correlation of .39 between temporal visual 
and auditory spans. His subjects were 83 
sixth-grade pupils. The present coefficient 
representing approximately the same relation- 
ship (Tests II and III, accuracy of placement 
scores) is .70, and was secured from 43 sixth- 
grade subjects. Although Humpstone (1919), 
Gates (1916), and Bennett (1916) have 
studied the relationship between visual and 
auditory span, a comparison of the present 
results with theirs would be essentially mean- 
ingless because of differences in procedures 
employed. 

(2) In answer to the second question asked 
above, it may be stated that constancy in 
mode of presentation results in higher cor- 
relation coefficients than does the factor of 
identical sense field. The correlations between 
the Auditory and Temporal Visual Test 
measures yielded the highest coefficients ap- 
pearing in Table X. This statement is based 
on the mean z values obtained from intercor- 
relations. The mean z value for the correla- 
tions between the Auditory and Temporal 
Visual Tests was .44, as compared with .27, 
between the Tachistoscopic and Auditory 
Tests, and .23, the mean z value of the cor- 
relations between the Tachistoscopic and 
Temporal Visual Tests. 


The correlations between the Tachisto- 
scopic Visual Test and the Auditory Test do 
not differ markedly from the correlations be- 
tween the Tachistoscopic Visual Test and the 
Temporal Visual Test. Both groups of co- 
efficients are low when taken singly, but when 
taken as a group their significance, albeit at 
a low level of relationship, is high. That is, 


| Vol. 8, No. 2 


chance alone could not account for the 
obtained figures. 

(3) It was indicated immediately aboy 
that the correlations between the scores op 
the Auditory Test and Temporal Visual Tes: 
which were presented in the same way, were 
higher than the correlations between score 
on the Tachistoscopic Visual Test, and either 
the Auditory or Temporal Visual Tests. This 
finding may be explained in terms of the fact 
that a common type of response was possible 
when the items of two tests involved were 
presented serially, whereas when the items of 
one are presented serially and the items of 
the other tachistoscopically, the possibility of 
making a common type of response to the 
two tests is reduced, despite the fact that the 
material was presented to the same sense 
field. For a response does not occur within a 
sense organ, but in a muscle or group oj 
muscles. Defined in this manner, the responses 
involved were probably of the speech organs. 
As the letters were seen and heard in the two 
serial tests, the subjects might have tended 
to reproduce them subvocally. The extent to 
which these subvocal responses were still 
operative when the reports were made would 
determine the size of spans when letters were 
seen or heard, the common speech response 
being responsible for the relationship between 
the two tests. The lower correlations between 
the Tachistoscopic Visual Test and either of 
the other two are, in terms of this theory, to 
be explained by the fact that the opportunity 
for making the common type of response is 
lacking.* Since it is a frequently proposed 
theory that children tend to vocalize their 
responses more as they grow older, it was 
deemed worthwhile to study the interrelation- 
ships between the two serial tests by grade in 
order to discover whether the correlations in- 
creased with increased grade level. The re- 
sults of this analysis were essentially incon- 
clusive. The correlations in terms of the gross 

*It has been stated above that the correlations between 


visual and auditory memory span are affected by certain 
possible a common t¢ of response % 


-— number of associations among the items of 4 sent. 
other words, the correlation between visual and auditory 
is higher when comparable mean ay 
ployed in the tests. Henmon (1912) found a correlation 
.70 between visual and audi pan 7] 
tests were co of series of related nouns, and of 2 
when the tests included series of unrelated syllables. Abelson’ 
(1911) correlation between memory spans when sentence 


employed . complete treatments of this 

reader is referred to the reviews by Tinker (1929) a4 
Blankenship (1938). 


to sense 
periment it has been observed that serial presentation ¥% 
such a condition which produced higher correlations. Another 


December, 1939] 


"span scores tend to increase from grades two 
- to six, at which point they drop somewhat. 
When the comparison was made on the basis 
of accuracy of placement scores, the correla- 
tions increased again, being lowest at grade 
two and highest at grade six. It seems sig- 
nificant to note that the next highest correla- 
tion in this group occurred at grade twelve, 
which, it should be remembered, was the 
highest grade studied. In the case of the 
" range scores, the correlations vary so irregu- 
larly from one grade to another that it is 
_ impossible to evaluate the results in the light 
_ of the present analysis. 

_ There are only slight differences in the cor- 
relations between measures of visual and 
' auditory memory spans occurring at the 
various grade levels investigated. Considering 
the inconsistencies in the trend of the coeffi- 
cients reported in Table X, it is not safe to 
generalize as to the effect of grade level on 
the relationships between visual and auditory 
memory spans. The data point to one tend- 
ency, namely, the fact that the relationship 
_ between memory spans seems to grow until 
the sixth grade, and then it diminishes as we 
progress through the higher grades. 

_ The findings of this section are summarized 
| briefly below: 

The highest correlations between scores on 
_ tests of visual and auditory memory span were 
_ secured when the measurements were made in 
terms of gross span scores. The accuracy of 
placement and range scores yielded approx- 
imately the same order of coefficients. Re- 
gardless of method of scoring, however, the 
correlations between tests of visual and audi- 
tory spans were not, on the whole, signifi- 
cantly high, which can be taken to mean that 
somewhat different functions are measured 
by the three tests. 

In general, the correlations between scores 
on tests employing serial presentation tended 
to be higher than correlations between scores 
on the Tachistoscopic Visual Test and, either 
the Auditory or the Temporal Visual Test. 
There were no essential differences in the cor- 
relations between the Tachistoscopic Visual 
Test and the Auditory Test, and between the 
Tachistoscopic Visual Test and the Temporal 
Visual Test. 

There was observed no general tendency 
for the correlations between visual and audi- 


Pd. standard errors of the mean z’s for the nine correla- 
at the studied are .058, .053, .055, .052, 
through twelve 


053, .065, .057, and .0S7, for grades two 
respectively. 


VISUAL AND AUDITORY MEMORY SPAN 


219 


tory memory span to be influenced by the 
grade level at which the correlations were 
computed. 


D. The Influence of Chronological Age, 
Mental Age, and Grade Placement on Mem- 
ory Span. Increases in chronological age, 
mental age,’® and grade placement are accom- 
panied by increases in memory span scores 
for all tests, with few exceptions, regardless 
of method of scoring. 


The mean scores for all subjects when 
grouped according to chronological age are 
presented graphically in Figure 1 which shows 
clearly that the increases from age to age are 
fairly consistent. Certain exceptions can, of 
course, be noted. This is to be expected since 
the number of cases is not sufficiently large 
for most age levels to preclude the possibility 
of deviations from the main tendency on the 
basis of sampling errors alone. When the. 
graphs of Figure 1 are scrutinized as a whole, 
one cannot fail to observe that there are in- 
creases in memory span scores, by chronolog- 
ical ages, for each method of scoring all tests. 
According to these graphs, growth in the func- 
tions measured by the Auditory and Temporal 
Visual Tests tends to reach a maximum at the 
age of 18 years. In the case of the Tachisto- 
scopic Visual Test the data tend to be more 
inconsistent, although even here it appears 
that the upper limit of growth is reached 
somewhere between the ages of 17 and 18 
years. 

Figure 2 presents the means on the memory 
span measures of all subjects classified accord- 
ing to mental age. When this graphic presen- 
tation is examined, although essentially the 
same characteristics of growth are to be ob- 
served, here as for the subjects classified 
according to chronological age, at least two 
important differences are to be noted. A 
somewhat greater consistency in growth by 
mental age is shown, especially for the Audi- 
tory and Temporal Visual Tests. Also, the 
age increases continue throughout the entire 
mental age range investigated, with the ex- 
ception of the range scores on the Tachisto- 
scopic Visual Test. Whereas the increases by 
chronological age tended to cease at the 
eighteenth year, increases by mental age per- 
sist through the mental age level of 21 years. 
Whether memory span increases would con- 
tinue beyond this mental age level cannot be 
determined by the present study, since the 


Based on Kuhlmann—Anderson test. 


| 


20 JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 8, No. > 


A. TACHISTOSCOPIC VISUAL TEST 


GROSS SPAN SCORES ACCURACY OF PLACEMENT SCORES RANGE SCORES 
8 6 
? 7 
‘ 
4 
$5 5 
‘ 
2 
23 3 3 
2 2 2 
a 
w ! ' 
< 
8 10 12 14 1S 16 17 18 19 10 12 13 14 1S 16 17 18 19 on? 68 


CHRONOLOGICAL AGE 


B. AUDITORY TEST 


GROSS SPAN SCORES ACCURACY OF PLACEMENT SCORES RANGE SCORES 

8 6 8 

7 ? 
55 5 
4 ry 
2 

2? 
“> 2 2 

' 
< 

7 9 10 12:13 14 15 6 18 19 6 10 13 14 1S 17 


CHRONOLOGICAL AGE 


C TEMPORAL VISUAL TEST 


GROSS SPAN SCORES ACCURACY OF PLACEMENT SCORES RANGE SCORES 

8 ry 

6 
35 5 

4 4 
35 3 3 
32 2 2 

>! 

7869 12 13 1415 16 17 18619 8 10 14 1S 17 18 19 ony 10 


CHRONOLOGICAL AGE 
FicurE 1. Mean Scores on All Measures of Memory Span for 310 Subjects Classified 
According to Chronological Age. 


December, 1939] 


GROSS SPAN SCORES 


~ eo 


AVERAGE NUMBER OF LETTERS 
n w > 


4 8 9 12 1S 16 17 185205 


MENTAL AGE 


GROSS SPAN SCORES 


or LETTERS 
~ 


nn 


AVERAGE NUMBER 


7869 1213 1415 6 17 
MENTAL AGE 


GROSS SPAN SCORES 


oF LETTERs 
~ 


AVERAGE NUMBER 


45 6 7 
MENTAL AGE 


VISUAL AND AUDITORY MEMORY SPAN 


A. TACHISTOSCOPIC VISUAL TEST 


ACCURACY OF PLACEMENT SCORES 
6 


° 
617 520.5 


B. AUDITORY TEST 
ACCURACY OF PLACEMENT SCORES 


$20.5 7 6 9 12 13 14 15 16:17 $205 


C. TEMPORAL VISUAL TEST 
ACCURACY OF PLACEMENT SCORES 
6 


° 
5205 7 68 9 10 tt 12:13 1415 06 17 185205 


221 
RANGE SCORES 
$205 
RANGE SCORES 
7869 10 1213 1415 6 17 205 
RANGE SCORES 
° 
se” $20.5 


FicuRE 2. Mean Scores on All Measures of Memory Span for 310 Subjects Classified 


According to Mental Age. 


222 


mental age level of 21 years was the highest 
represented by our subjects. 

One final observation concerning the 
graphs in Figure 2 should not escape notice. 
The characteristics of growth exhibited by the 
Auditory and Temporal Visual Test, as meas- 
ured by corresponding methods of scoring, are 
strikingly similar. The importance of this 
observation is the fact that scores on either 
test are almost equally successful in differen- 
tiating successive mental age levels. The im- 
plication of this finding is the possibility of 
substituting a temporal visual test of memory 
span for an auditory test in testing the in- 
telligence of deaf children. This does not 
mean that the two tests may be used inter- 
changeably as measures of the same trait, 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


since the correlations between the Temporgi 
Visual and Auditory Tests reported in Table 
X above were rather low. Ideally, perhap; 
tests of both auditory and temporal visya) 
span should be included in an_ individya 
intelligence examination. 

A comparison of the writer’s mental age 
norms for auditory and temporal visual mem. 
ory spans with the age norms obtained by 
Hao (1924) and Smedley (1902) is presented 
in Table XI. Terman’s (1916, 1937) figures 
for auditory span are also included in the 
comparison. Before proceeding to a discus- 
sion of this table, it should be stated that the 
subjects of the Hao, Smedley, and Terman 
studies were classified according to chronolog- 
ical age, not mental age. Since these investi- 


TABLE XI 
COMPARISON OF AGE NORMS ON TESTS OF VISUAL AND AUDITORY MEMoRY SPAN 


Auditory Span Visual Span 
Age Present Hao Smedley Terman Present Hao Smedley 
Writer Writer * 
7 8 4.7 5 5 _ 1.5 4.4 5 
1.5 4.7 5 1.8 5.3 § 
1.6 5.2 ‘5 2.8 5.8 6 
10 2.0 5.5 6 6 2.9 6.3 6 
ll 2.0 6.1 6 3.0 71 6 
12 2.5 6.3 6 3.1 7.9 7 
13 3.1 6.5 6 3.6 7.9 7 
140-33 6.6 6 7 3.7 7.8 7 
15 3.5 6.7 6 4.5 7.9 7 
16 4.0 6.5 6 4.9 8.6 7 
17 3.9 6.5 7 4.4 8.4 8 
18 6.1 6 8 9.0 7 
18.5 4.1 5.0 
19 7 8 
20 
20.5 5.0 5.5 
21 
22 9 — 


* The visual span scores of the writer are based on the Temporal Visual Test. 


December, 1939] 


gators used rather large samplings of sub- 
jects," however, it would seem justifiable to 
assume that the average mental ages of the 
subject groups corresponded to their chrono- 


7 logical ages, so that for practical purposes all 


of the figures presented may be considered as 


mental age norms. ; 

The contents of Table XI show that in each 
of the studies considered, increases in memory 
span scores by age were obtained for both 


tests. The mental age norms obtained from 


the present study are smaller than the age 
norms obtained in other studies at correspond- 


_ ing levels, the differences being more marked 


at the younger levels. These difference may be 


attributed chiefly to one factor. In the earlier 


studies stimulus series of varying length were 
used in order that a threshold method, defined 
in terms of the longest series reproducible by 
a certain proportion (usually 50 or 75 per- 
cent) of a given age group, could be employed 
for determining size of span. The writer’s age 


- norms, on the other hand, are in every case 
_ arithmetical averages of individual scores 


(accuracy of placement scores in this in- 


j stance) based on 20 test trials of uniform 
length. The number of items in each stimulus 
series was intentionally made larger than the 


number which the subjects could possibly re- 
port. Such an increase in the number of 


items tends to result in a reduction in the 
' relative number which can be remembered 


and reported, because when the individual 


| tries to group more items than he can hold 
| in mind at one time, the effects of both pro- 


active and retroactive inhibition operate to 
reduce the number eventually reproduced ac- 


_ curately below the number that he could 


apprehend, were a threshold method used. 
The differences in span between the youngest 
and oldest of our subjects may be explained 
in terms of the relatively greater effect of in- 
hibition present in the youngest subjects, 
which is to say, that when the discrepancy 
between the number of items with which the 
subject is presented and the number of items 
he is potentially able to maintain in mind at 
one time is increased, the loss through inhibi- 
tion in number of items reported will be 
greater. Thus is explained, in part at least, 
not only the fact that the figures of the 
present writer (Table XI) are smaller than 
the corresponding norms of the others, but 
also the fact that the differences are greater 
at the earlier age levels than they are at the 
“Cf. Tables If and III, Terman (1937). 


VISUAL AND AUDITORY MEMORY SPAN 


223 


later levels. It should be remarked that an- 
other factor operated to produce these differ- 
ences. In Hao’s and Smedley’s studies the 
subjects were given as many as four trials 
to reproduce a series of a given length. The 
subjects needed to reproduce only one series 
to be credited with a span of that length. 
Terman’s procedure allows three trials per 
length. The fact that the three earlier studies 
employed digits instead of letters is not of 
grave concern, since familiarity with both 
types of materials for all practical purposes 
is the same. Hao and Smedley administered 
their tests by the group method, Terman 
individually. 

Whatever the merits of the respective 
methods of measuring memory span are, it 
would seem that the method employed in the 
present study was as successful, if not more so, 
in differentiating successive mental age group- 
ings than the threshold methods employed by 
Hao, Smedley, and Terman. The increase by 
age for the present Auditory and Temporal 
Visual Tests is as consistent as the increases 
reported by Hao and Smedley for their cor- 
responding tests. Also, the present writer 
finds just as refined distinctions from age to 
age as did Hao, and more refined differences 
than did Smedley, when his data in terms of 
numbers of letters reproduced are reported as 
age norms. The present results compare 
favorably with Terman’s figures with respect 
to these considerations inasmuch as they 
apply to auditory memory span. Finally, the 
range of variation, in terms of the difference 
in scores (Table XI) between the oldest and 
youngest levels, is greater for both the present 
tests of auditory and visual span than for the 
corresponding differences found for these tests 
by the other investigators. 

The memory span means by grade place- 
ment of the subjects are presented in Figure 
3. The three methods of scoring the tests of 
memory span do not differentiate as consist- 
ently between the lower grade levels as they 
differentiate between the higher grade levels. 
Of course, this is to be expected, since data 
for grades seven, nine, and eleven have been 
omitted. The implication of this finding is 
that, insofar as the present study was able 
to determine, growth in memory span scores 
does not reach a limit within the grade range 
investigated. 

Certain definite tendencies with respect to 
differences in scores at a given 


memory span 
age or grade level are attributed to the three 


224 JOURNAL OF EXPERIMENTAL EDUCATION |Vol. 8, No.» 


A TACHISTOSCOPIC VISUAL TEST 


GROSS SPAN SCORES ACCURACY OF PLACEMENT SCORES RANGE SCORES 
« ? 7 7 
6 6 
3° 5 
3 
2? 3 3 
S) 2 2 2 
' 
° ° 
234586 8 © 23456 8 © 2348586 8 
GRADE 
B. AUDITORY TEST 
GROSS SPAN SCORES ACCURACY OF PLACEMENT SCORES RANGE SCORES 
6 
$7 ? 
~ 
6 
$5 5 
Se 4 4 
3 3 
$2 2 2 
' 
< * 
GRAOE 
C. TEMPORAL VISUAL TEST 
GROSS SPAN SCORES ACCURACY OF PLACEMENT SCORES RANGE SCORES 
6 
6 6 6 
8 5s 5 
Se 
H 3 3 3 
3 2 2 2 
a 
w ' 
GRADE 


Figure 3. Mean Scores on All Measures of Memory Span for 310 Subjects Classified 
According to Grade Placement. 


December, 1939] 


tests employed and to the three methods of 
scoring. Though the differences brought about 
by these factors vary with the classifications 
according to chronological age, mental age, 
and grade placement, several specific obser- 
vations can be made. 

Regardless of the manner in which the sub- 
jects are classified, the Tachistoscopic Visual 
Test scores were always lower than corre- 
sponding Auditory and Temporal Visual Test 
scores for all methods of scoring. With the 
exceptions of the scores of the subjects at 
year seven, on both the chronological and 
mental age classifications and on the range 
scores for the Auditory Test at the 13-year 
mental level, the Temporal Visual Test scores 
were always higher than corresponding Audi- 
tory Test scores. The grade averages show 
essentially the same tendency. With the ex- 
ception of the range scores of the Auditory 
Test in grade ten, Temporal Visual Test 
scores are always larger than corresponding 
Auditory Test scores. 

The effect of method of scoring on the size 
of memory span scores is not constant for 
the three tests.’* In the case of the Tachisto- 
scopic Visual Test the gross span scores are 
always larger than the range scores, regard- 
less of which classification of subjects is used. 
According to the chronological and mental 
age classifications, the gross span scores tend 
to be larger than corresponding range scores 
for both Auditory and the Temporal Visual 
tests for the subjects between the ages of 
seven and fifteen years; beyond this age level 
the range scores are larger for the remainder 
of the chronological and mental age ranges 
for both tests, with the exception that the 
gross span scores are larger than the corre- 
sponding range scores on the Temporal Visual 
Test at the chronological age level of 15 
years. The tendency stated in other words is 
that at the higher chronological and mental 
age levels the subjects were able to report 
letters separated by wider intervals, although 
the letters of the intermediate positions were 
hot reported with a corresponding increase in 
accuracy. The averages for the grade classi- 
fication of subjects (Figure 3) show definite 
tendencies with respect to the superiority of 
either gross span or range scores. 

The foregoing discussion of the effect of 
chronological age, mental age, and grade 
to definition, accuracy of are 


anal of i 
deal with the of methods ‘of ‘scoring 


VISUAL AND AUDITORY MEMORY SPAN 


225 


placement on memory span can be summar- 
ized in the following statements: 

The most regular growth in memory span. 
development is shown in Figure 2 which’ 
presents average scores of subjects classified 
according to mental age. There were fewer 
interruptions occurring from year to year, and 
age increases were obtained throughout the 
twenty-first year, the highest mental age level 
studied; whereas when the subjects were clas- 
sified according to chronological age there 
were more interruptions from year to year, 
and age increases were not obtained beyond 
the chronological age of 18 years. Stated in 
other terms, it would seem that, although the 
present tests of memory span do not differ- 
entiate between the two uppermost chrono- 
logical age levels (18, 19) they do, in general, 
continue to differentiate between successive 
mental age levels above year eighteen. 

The age norms resulting from the present 
study are lower, in terms of numbers of items 
reproduced, than norms of previous investi- 
gations in those instances where comparisons 
could be made. These differences are ex- 
plained chiefly by the facts that in previous 
studies stimulus units of varying length were 
used, and that, whereas the present writer 
computed arithmetical averages of individual 
scores, the earlier investigators employed 
threshold methods, in which only the best 
responses were considered in the scoring. 
However, the present tests of memory span 
distinguish between successive age levels 
equally as well as the tests employed in 
earlier studies. 

With only few exceptions, the highest 
scores of memory span in the present study 
occurred when using the Temporal Visual 
Test, regardless of which scoring method is 
used as a basis for comparison and regardless 
also of the age or grade. The scores on the 
Tachistoscopic Visual Test were consistently 
the lowest. In other words, the method of 
serial presentation yielded the highest scores. 
The Temporal Visual Test and the Auditory 
Test discriminated equally well between suc- 
cessive mental age levels. The implications 
of this finding have been noted above. 

The effect of scoring method is not con- 
stant for all tests nor for all age levels. On 
the two serially presented tests, gross span 
scores tended to be larger than corresponding 
range scores until the fifteenth year, accord- 
ing to both chronological and mental age 
classifications. Beginning with the fifteenth 


226 


year, however, the range scores exceeded the 
corresponding gross span scores. On the 
Tachistoscopic Visual Test the gross span 
scores are always larger than the correspond- 
ing scores of that test regardless of age or 
grade level where the comparison is made. 
E. The Effect of Letter Position. It has 
been shown in the immediately preceding sec- 
tion of this report that there are increases in 
memory span with increases in grade level. 
In this section an attempt is made to discover 
whether there was a tendency for the sub- 
jects to report letters with greater frequency 
at certain positions within the letter series 
than at others, and especially to discover 
whether there are any characteristic changes 
in the frequency with which letters are re- 
ported at certain positions as the memory 
span increases from grade to grade. The re- 
sults secured from the three tests with re- 
spect to this analysis will also be compared. 
In order to investigate these problems, 
accuracy of placement scores at each position 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


were converted into percentages of the total 
number of letters apprehended in each grade 
The percentages thus obtained are presented 
in Table XII. These data, when plotted jp 
the form of curves on a graph, show Clearly 
that the letters in the initial positions, par. 
ticularly in the Tachistoscopic Visual Tes 
were reported with the greatest frequency. 
Consider Figure 4. It shows the percent 
scores of the Tachistoscopic Visual Test by 
letter position for each grade. From grade 
two through eight it will be observed imme. 
diately that the curves in all cases are a 
their peak in the first letter position. Letters 
in the second position were reported with 
greater frequency than those at the third, 
and the letters at the third with greater fre- 
quency than those at the fourth, and so on 
until the curve reached the abscissa. The 
same results occurred in grades ten and 
twelve, with the exception of the slight rise 
at the eighth letter position in grade ten, and 
the rise at the ninth position in grade twelve. 


TABLE XII 
LETTER POSITION SCORES IN PERCENT BY GRADE FOR THREE LABORATORY TESTS 

—Tetter Position 4 5 6 8 9 
Test 1 25.8 Lez ow e4 | 0.0 

Grade 2 Test 2 30.2 19.4 12.6 6.7 3.8 1.8 2.8 7.4 15.5 
Test 3 36.6 25.2 19.5 10.8 3.3 8 0S eo 2.9 

Test 1 67.7 22.5 6.9 1.4 1.1 eo Pe | 0,0 ol 

Grade 3 Test 2 22.5 15-7 12.2 6.4 3.5 2.8 6.1 12.4 20.4 
Test 3 26.2 20.6 17.4 11.3 6.5 2.9 1.9 2.2 10.8 

Test l 56.9 22.1 13.8 5.0 1.6 ed ee 0.0 el 

Grade 4 Test 2 32.7 21.0 2.6 2064 3.8 722 
Test 3 26.2 19.8 16.6 11.0 6.7 4.0 3.9 4.0 7.7 

Test 1 56.4 21.8 11.9 5.1 2.6 1.8 ed of <A 

Grade 5 . Test 2 22.5 12.6 10.7 6.0 4.4 5.2 6.9 12.4 18.7 
: Test 3 22.2 15.2 13.3 8.8 5.6 4.2 5.0 8.4 17.4 

Test 54.4 22.4 13.1 6.6 2.5 el 02 

Grade 6 Test 2 19.5 13.8 10.8 8.6 6.0 4.3 7.0 13.5 16.6 
Test 3 21.4 16.9 15.9 12.4 9.2 4.9 3.6 _ 5,6 10.0 

Test l 50.8 23.1 13.7 7.1 3.8 1.5 0.0 el ol 

Grade 8 Test 2 22.4 13.1 10.0 7.4 6.4 5.7 8.0 11.7 15.5 
Test 3 25.3 16.5 14.3 9.4 Tel 467 6,5 12,0 

Test l 39.8 21.35 13.7 7.4 $.4 3.7 S32 3.6 2.0 

Grade 10 Test 2 15.9 10.1 9.0 8.1 8.9 7.4 10.8 15.1 14.5 
Test 3 20.2 14.6 12.0 9.0 8.5 7.2 7.6 9.0121) 

Test 1 34.2 19.0 15.1 12.1 9.7 4.9 14 05 5.0 

Grade 12 Test 2 13.0 9.1 7.9 7.2 907 9.3 13.7 14.4 15.7 
Test 3 16.1 12.5 10.5 8.9 8.1 8.2 10.2 11.5 14.0. 


December, 1 939] 


VISUAL AND AUDITORY MEMORY SPAN 227 


bes = 
e GRADE 2 60 GRADE 3 60 GRADE 4 
3 
45 “5 4s 
- 
a 
» » 
1234867 6 ‘234856786080 123456786080 
LETTER POSITION 
7s 
60 GRADE 5 60 GRADE 6 eo GRADE 8 
z 
43 as 4s 
z 
5% 30 30 
3 
is 
° ° 
'2345 67°86 @ '23485 6786 80 '234 8 6 
LETTER POSITION 
7 7s 
60 GRAOE 10 60 GRADE 
z 
4s 45 
a 
z 
= 30 30 
° 
is is 


Figure 4. Accuracy of Placement Scores by Letter Position on the Tachistoscopic Visual Test 
for Successive Grade Levels. 


80 


228 


At the lower grade levels the curves decline 
abruptly; at the upper grades the decline is 
somewhat more gradual. These data can be 
generalized in the statement that the in- 
creases in tachistoscopic span, by grade, as 
evidenced by the curves of letter position, are 
to be described in terms of a cumulative in- 
crease in scores by letter position from left 
to right. 

The above results are interesting when it 
is recalled that fixation was directed at the 
middle, or fifth position, within each series 
for all tachistoscopic exposures. Why did the 
subjects not report with greatest frequency 
the letters which fell at the point on the 
screen which was fixated directly? This ques- 
tion might be answered in part, at least, in 
terms of habits which have been learned 
through practice in reading. Since the Eng- 
lish language is read from left to right and 
children are taught to attack words in the 
same direction, they become conditioned to 
direct their attention to the beginnings of 


8 


700} 


500 
400 
300 


200) 


2 3 4 7 ry 
Figure 5. Tachistoscopic Letter Position 
Scores of 38 Hebrew Children for Hebrew and 
Roman characters. 
Solid line, Roman characters. 
Broken line, Hebrew characters. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. 2 


lines and words. It seems reasonable to sup- 
pose, therefore, that our subjects transferred 
this habit of attending to their apprehension 
of the nonsense words used in this experi. 
ment. This does not imply, however, that the 
point of fixation was changed. It is a wel). 
known psychological fact that the point of 
direct fixation, and the point attended to, dp 
not need by any means to be the same. It js 
pertinent in this connection to note the find- 
ings of Anderson and Crosland** who discoy- 
ered that Hebrew children studying English 
and Hebrew concurrently, reported letters in 
the position at the right, namely, the ninth, 
with the highest frequency when nine-letter 
nonsense words consisting of Hebrew char. 
acters were presented tachistoscopically. One 
of their original graphs is reproduced as 
Figure 5. The broken-line curve portrays the 
letter position scores of these children for 
Hebrew characters, the solid-line curve their 
scores for Roman characters. The only point 
that need be noted here is that the letters at 
the ninth position were reported with the 
highest frequency when the stimulus words 
consisted of Hebrew characters, and the let- 
ters at the initial positions were reported with 
the highest frequency when the stimulu: 
words were composed of Roman characters. 


Since it was argued above that the tendency 
to report letters at the left-most positions is 
learned through reading English, it would 
seem that the better the reader the more pro- 
nounced this tendency should be, and con- 
trariwise, if a child has not learned to read 
well, perhaps his tendency to report letters at 
the left-most positions will not be as strong. 
In the light of the tendency of some poor 
readers to attack the end letters of words 
first, it might even be predicted that poor 
readers will catch a disproportionate number 
of letters at the right-most positions. The 
problem as stated has recently been investi: 
gated thoroughly by Crosland (1939) with 
groups of third grade children. He found 
that there were definite differences in letter 
position scores between good and poor read- 
ers. The good readers as a group excelled the 
poor readers in the frequency with which they 
reported letters in the first few positions 
within the letter series, whereas his group of 
poor readers excelled the good readers in ' 
porting letters at the right-most position. 


data by 1. H. Anderson and R. Crosland 
af of Psychology. 


/ 
/ 
/ 
/ 
/ 
/ 
\ 
\ 
/ 
‘ 
/ 
/ 
* 


December, I 939] 


VISUAL AND AUDITORY MEMORY SPAN 


iN PERCENT 


SCORE 


678 
LETTER POSITION 


Figure 6. Accuracy of Placement Scores by Letter Position on the Auditory Test for 
Successive Grade Levels. 


40 40 
rx} 9 GRADE 2 30 GRADE 3 30 GRADE 4 
z 
i 20 20 20 
| \ 
\ 
| » 
'0 10 
| 
| ° 
LETTER POSITION 
40 40 40 
30 GRADE 5 30 GRADE 6 GRADE 6 
z 20 20 20 
210 10 10 
LETTER POSITION 
40 40 
GRADE 10 GRADE 
20 


67 8 @ 


229 


230 . JOURNAL OF EXPERIMENTAL EDUCATION 


The implication of these results is obvious. 
Children should be encouraged to direct their 
attention to the beginnings of words and to 
perceive them from left to right. Why some 
poor readers fail to develop such habits of 
attending may be explained in terms of a 
number of factors. Crosland’s recent report 
presents evidence to indicate that |left- 
handedness and especially left-eyedness might 
be important conditions. 

Before proceeding to other considerations 
it might be well to review our results in the 
light of the findings of previous investigators 
who have studied the problem of the effect 
of letter position on tachistoscopic reports. 
The results secured by the present writer for 
grades ten and twelve show the same tendency 
evident in the letter position data reported 
by Tinker (1932) for 300 university students. 
These adult subjects reported with greatest 
frequency the letters in positions at the ex- 
treme left, with a gradual diminution in per- 
cent scores up to the eighth position where 
the curve changed its direction upwards. 
Tinker’s exposure time was approximately 
three seconds per test trial, hence the eye 
movements of his subjects were not controlled. 

In another study employing college stu- 
dents as subjects, Crosland and Johnson 
(1928) found somewhat the same tendency 
in letter position scores. Their results are not 
strictly comparable with Tinker’s or those of 
the present study, since they used stimulus 
units varying in length from three to ten 
letters. Actually, there is little difference be- 
tween the letter position percent scores re- 
ported by Crosland and Johnson and those 
reported by Tinker, or the present writer. 
The fact that the former authors employed 
a truly tachistoscopic technique may be re- 
sponsible for-the closer similarity their results 
bear to those of the present writer.** 

The letter position curves for the Auditory 
(Figure 6) and the Temporal Visual (Figure 
7) Tests resemble each other to a greater 
extent than either approximates the curves 
yielded by the Tachistoscopic Test, although 
there is a rather marked similarity between 
the curves of the two visual tests at grades 
two, three, and four. The curves for the 
Temporal Visual Test (Figure 7) indicate 
that at all levels the subjects reported with 

™ Percent scores found by Crosland and Johnson (1928) 

30 adult from left to 


for ten letter for 

12.4, 10.4, 8.1, and 9.7. 


[Vol. 8, No.2 


greater frequency the letters in the positions 
at the extreme left than the letters appearing 
in the positions at the extreme right, as wa; 
the case for the Tachistoscopic Visual Tes. 
At no level, however, was the difference be. 
tween the memory value of the extremes a 
pronounced as in the Tachistoscopic Test. 

Except for grade twelve, the remarks which 
were made for the Temporal Visual Tes 
apply, in general, also to the Auditory Test, 
The superiority in memory value of the posi- 
tions at the left in the case of the latter, 
however, is not as marked as in the case of 
the former. The letter position scores for both 
tests tend to approximate each other with 
increased grade. 

With regard to the problem of describing 
growth in memory span by grades, an impor. 
tant difference may be observed between the 
two serial tests on the one hand and the 
Tachistoscopic Visual Test on the other. 
Whereas in the case of the Tachistoscopic 
Test the increased scores were described in 
terms of a cumulative increase with which 
letters were reported from left to right, the 
curves for the two serial tests show that the 
increases tend to proceed inward from both 
extremes. 

To return again to the problem of compar- 
ing the curves for the three tests, one cannot 
but be impressed by the great similarity of 
those for the two serial tests, especially after 
grade three, and the dissimilarity in the curves 
between each of these and the Tachistoscopic 
Test. The dissimilarity consists essentially of 
a greater recall value of the final positions for 
the serial tests, which is to say, that in these 
tests the letters at both extremes tended to be 
favored in the reports, although, as stated 
above, the scores at the initial positions, with 
one exception, excelled the scores at the final 
position. These differences between extreme 
scores tended to be most pronounced at the 
earlier grade levels and least at the later 
levels. 

The tendency for the subjects to report 
with relatively greater frequency the letters 
appearing at both end positions for the two 
serial tests can be interpreted in terms of wel 
known psychological principles. The first and 
last letters are remembered better because 
there is less chance of confusion at each ex 
treme of a learning series. Memory for the 
letters at the initial positions is free from 
proactive inhibition, defined as interference 
caused by preceding material. At the 


SCORE IN PERCENT 


d December, 1939] VISUAL AND AUDITORY MEMORY SPAN 231 


40 ao 40 
© 30 GRADE 2 30 GRADE 3 30 GRADE 4 
a 20 20 
z 2 
10 10 
° ° 
LETTER POSITION 
40 40 40 
0 GRADE 5 30 GRADE 6 30 GRADE 8 
20 20 20 
0 10 10 
° © 
'234567 869 1234567 86 @ 
LETTER POSITION 
40 40 
5 30 GRADE 10 30 GRADE i2 
: 
z 20 20 
20 10 
° 


'2345 6786 8 
LETTER POSITION 


Ficure 7. Accuracy of Placemen 


by Letter Positions on the Temporal Visual Test 


t Scores 
for Successive Grade Levels. 


t 

) 


232 


positions there can be no confusion caused by 
material which follows, hence an absence of 
retroactive inhibition. In the intermediate 
positions, confusion arises from both proactive 
and retroactive inhibition. Consequently, the 
accuracy of placement scores for these tests 
have been consistently lowest for the middle 
positions. 

Another explanation of results such as were 
obtained in the present study has been offered 
by Koffka (1935), and Koehler (1938). 
These investigators emphasize the fact that 
items occurring at the primary and final posi- 
tions of a learning series are “unique’’; they 
stand out somewhat as “figures on ground”, 
to use their terminology; and make a more 
vivid impression upon the learner. He remem- 
bers them better for that reason. Letters in 
the intermediate positions, however, do not 
stand out in contrast since they are imbedded 
within a series of homogeneous material. 
There is no figure-ground relationship in- 
volved in this case to emphasize certain let- 
ters in the mind of the learner. According to 
Koffka and Koehler, the “uniqueness” of the 
end-most items in a series tends to spread to 
adjacent letters to account for their relatively 
high memory value. Robinson (1920) and 
Robinson and Brown (1926) offered earlier 
an explanation in somewhat similar terms, 
arguing that items occurring at the extremes 
of a series coincided with a sudden change 
in attention and were remembered better for 
that reason. The authors, as did the present 
writer, found that primacy was a stronger 
factor than finality in facilitating memory of 
meaningless material. 

The other studies which have been con- 
ducted in this field are too numerous to men- 
tion here. Woodworth (1938) has summar- 
ized admirably the pertinent findings of this 
earlier work in two statements: (1) The 
middle members of a series of nonsense mate- 
rial are learned more slowly than the items 
at the extremes. (2) Learning occurs from 
the extremes inward. 

The data on letter position are summarized 
in the following statements: 

The letter position curves of the Tachisto- 
scopic Visual Test show that the frequency 
with which letters are reported at the nine 
positions decreases from left to right. The 
decline of the curves is more abrupt at the 
lower grades than at the upper grade levels. 
It naturally follows from this finding that the 
increases in spans by grades, which were ob- 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No. ; 


served for this test in an earlier section, are 
to be described in terms of an extension of 
these spans from left to right. 

Rather remarkable similarities were ob. 
served in the letter position curves for the 
Auditory and Temporal Visual Tests. These 
resemblances were especially conspicuoys 
after grade three. 

The letter position curves for each of the 
serial tests, on the other hand, differed rather 
strikingly from the curves of the Tachisto- 
scopic Visual Test. The differences consisted 
of greater memory value for letters at the 
final position in the serial tests. In these tests 
letters at both extremes tended to be favored 
in the reports, whereas in the Tachistoscopic 
Visual Test reports of letters were confined 
largely to the initial positions. 

Even in the case of the serial tests, how- 
ever, the initial positions tended to be favored 
over the final positions. The superiority of 
the initial positions in letter position scores 
was more pronounced at the lower grade 
levels than at the later levels, where the scores 
for the two extreme positions tended to 
approximate each other. 

It follows that the increases in memory 
span scores, by grade, for the two serial tests 
and the Tachistoscopic Visual Test are to be 
described differently. In the case of the 
former the increases proceeded from both 
extremes inward, while in the latter, as re- 
marked above, the increases occurred uni- 
directionally, from left to right. 

F. The Relationship Between Memory 
Span and Reading Ability: 1. Correlations 
Between Memory Span Scores and Measures 
of Reading Ability. In Tables XIII and XIV 
are presented the correlations between each 
of the memory span scores and scores of 
silent reading comprehension and silent read- 
ing speed, respectively. An inspection of these 
coefficients will show that the correlations 
between the memory span scores and either 
of the two measures of reading ability used in 
this study were not, in general, high in terms 
of the criterion employed in a previous sec- 
tion, namely, that the z equivalent of a cor- 
relation coefficient should be at least three 
times as large as its standard error. 

The highest correlations were secured for 
grade six, and even these were low. Among 
the coefficients calculated between memory 
span and reading comprehension for this 
grade, there were only two large enough , 
be statistically significant. The coefficient 0 


December, 1939] VISUAL AND AUDITORY MEMORY SPAN 233 


TABLE XIII 
CORRELATION BETWEEN READING GRADE SCORES AND NINE MEASURES OF MEMORY SPAN* 


I,Tachistoscopic Visual Test II, Auditory Test III.Temporal Visual Test Mz 


orede Scores Scores Scores Scores Scores Scores Scores Scores Sccres 
(2.17) z 252 247 22 220 016 221 19 
220 219 18 02 052 204 
(2.18) z 220 219 18 202 234 204 e017 
4 r 019 206 216 225 230 220 012 
z -19 206 216 226 51 220 12 220 
5 r 205 223 221 221 206 
(22-16) 223 015 Pie 213 P+ | 06 els 
6 r 210 57 240 236 257 «40 
(2.18) z 234 210 65 242 38 65 242 26 
6 -.09 -,20 -.18 225 18 28 250 
10 r 212 243 015 -.02 ol? 232 e27 
(2.192 01 el2 -.02 -08 17 
2 226 -,02 202 28 223 -.12 210 0S 
-.02 202 223 -.12 210 -.0 
. 


*The abbreviations GS, AP, and R represent Gross Span, Accuracy of Placement, and 
Range, respectively. Mz represents mean z value. 


TABLE XIV 
CORRELATION BETWEEN SILENT READING SPEED SCoRES AND NINE MEASURES OF MeMorRY SPAN* 


I. Visual Test Auditory Test THT, rompora} Visua) Test Mz 
G 
Grade Sceres Scores Scores Scores Soores Scores Scores Scores Scores 


2 r .36 44 38 016 12 17 242 226 
(2.1) .38 47 240 16 045 39 
$ r .2 234 227 ell 205 42 252 220 
(=.16) z 25 235 239 228 ell 06 245 «20 227 
4 r 02 210 022 219 206 
(2.16) z 02 210 222 219 09 06 212 
5 r 31 38 210 08 06 
(2.16) z 417 32 40 o14 =, 08 +29 -08 006 17 
6 ro 6442 210 13 262 46 
+10 +13 +73 +95 50 34 
8 r -.17 -.15 222 206 Pie -.02 

10 204 48 ell 205 50 (023 221 

(@2.19 204 -.05 ell 205 219 221 18 
12 r 35 19 209 16 -.10 220 
(42.17) 2 37 +19 239 +20 07 

Ze 


*The abbreviations GS, AP, and R represent Gross Span, Accuracy of Placement, and 
ge, respectively. Mz represents mean z value. 


234 


.57 occurred in two cases, namely, when the 
gross span scores of the Auditory and of the 
Temporal Visual Tests were correlated with 
silent reading comprehension. Two additional 
significant correlations appear in Table XIII, 
for the Tachistoscopic Visual Test in grade 
two. These were .48 and .50, for the gross 
span and accuracy of placement scores, 
respectively. 

When the correlation between speed of 
silent reading scores and each measure of 
memory span are inspected (Table XIV), it 
will be observed that, although the coefficients 
are in a majority of the cases positive in 
sign, only a few are large enough to be sig- 
nificant. As was the case above, the highest 
correlations here are also found in grade six. 
Of the nine correlations computed for this 
grade, only three are high enough to be sig- 
nificant. These were the correlations of .62, 
.74, and .46 for the gross span scores of the 
Auditory Test, and for the gross span and 
accuracy of placement scores of the Temporal 
Visual Test. 

Since so few of the correlations reported in 
both Tables XIII and XIV are significantly 
high, the most constructively positive state- 
ment we can make from these tables as 
wholes is to remark that the correlations be- 
tween memory span scores and reading test 
scores are significant at a low level of rela- 
tionship. This finding is expressed on the 
basis that chance alone could not account for 
the consistent positive coefficients secured. 
Further, on the basis of the mean z values 
reported for succeeding grade levels it would 
seem that the relationship between memory 
span and reading ability, as measured by our 
reading tests, is more marked in grades two, 
three, and four (taken together) than in 
grades eight, ten, and twelve.'® 


The results presented here are not to be 
considered atypical, for where comparisons 
can be made, they are in general agreement 
with the findings of previous research. Davis 
(1932), for example, who made a study of 
the visual and auditory spans of 36 children 
just completing the first grade, found a cor- 
relation of only .40 between visual span scores 
and scores on an objective reading test, and 
of .15 between auditory span and the same 


test of reading. Although Davis used stimulus 


“From Table XIII, the value grades 2, 3, 
a 

8, 10. and 12, 13. 


and 4 is .23; for grades 


JOURNAL OF EXPERIMENTAL EDUCATION 


[ Vol, 8, No, 2 


series of varying lengths (3-7), she kept rate 
of presentation constant for both tests, and 
credited only those reports in which the orig. 
inal order of the stimulus items was preserved 
so that her results should be somewhat com, 
parable with those of the present writer for 
the accuracy of placement scores of the 
Auditory and Temporal Visual Tests in grade 
two. In the present study, the correlations 
reported for these measures in grade two were 
.21 and .20. 


The correlations between memory span 
scores and other intellectual functions, as 
found by previous investigators, are just as 
low. In a study of 50 unselected ninth-grade 
pupils, Clark (1924) found a correlation of 
—.03 between auditory digit span scores and 
total scores on the Revised Army Alpha. 
Hao’s (1924) coefficients between auditory 
span and teachers’ estimates of intelligence 
and of arithmetic ability were .15 and .18, 
and between temporal visual span and those 
measures, .37 and .46. These figures were 
based on data secured from 83 sixth-grade 
Chinese children. 


In summary, the following statements may 
be made regarding the discussion presented 
in this section: 

As measured by the correlation technique, 
there is, in general, a low but significant rela- 
tionship between scores on memory span 
tests and scores on the objective reading tests 
used in this study. Although a majority of 
the coefficients were positive, relatively few 
were large enough to be highly significant. 

On the basis of the z value technique, 
which permits an averaging of correlation 
coefficients, it would seem that the relation- 
ship between memory span scores and read- 
ing test scores may be higher at the lower 
grade levels. 


2. Differences in Memory Span Scores Be- 
tween Matched Groups of Good and Pow 
Readers. Although the correlation coefficients 
reported do show definitely that scores on the 
present tests of memory span do not differen- 
tiate successfully between small differences 0 
reading ability, the question may be raised 
as to whether the tests of memory span dif- 
ferentiate between extreme groups of reading 
ability. If the present tests do differentiate 
between extremes, their purpose as tools to 
be used in clinical diagnosis would still be 
justified, since the ordinary clinical case § 


= 
( 
j 
a 
t! 
fr 
h 
b 
of 
al 


December, 1939] 


ysually extremely retarded. It is the purpose, 
then, in this section to discover whether it is 
possible to differentiate extreme groups of 
good and poor readers on the basis of 
memory span scores. 

Twenty-six pairs of subjects from grades 
' three, four, five, six, and eight were matched 
on the basis of grade, chronological age, and 
Dearborn C 1.Q. The members of each pair 
were separated as widely as possible on read- 
ing ability. The composite grade scores of 
silent reading comprehension were used to 
separate the partners of each pair. The Dear- 
born C 1.Q. was chosen in preference to the 
composite I.Q., since the other mental tests 
make relatively great demands on reading 
ability, and any matching scheme executed on 
the basis of tests which to a large extent 
measure reading ability could not serve our 
purpose when wide variations in reading abil- 
ity were sought. The mean chronological age 
of the good readers was 129.5 months, and 
the mean Dearborn C 1.Q. of this group was 
109.3. The poor readers averaged 130.5 
months in chronological age, and 110.0 in 
Dearborn C 1.Q. The differences in reading 
grade score between the good reader and the 
poor reader of each pair ranged from .6 of a 
grade to 3.5 grades; the median difference 
was 1.45 grade score units. Nine measures of 
memory span (3 tests, each scored in three 
different ways) were recorded for each sub- 
ject employed in the matching. A total of 234 
pairs of memory span scores were thus tabu- 
lated for the 26 matched pairs. In 63 percent 
of these 234 pairs of scores, the score for the 
good reader exceeded that of the poor reader. 
The significance of this percentage may be 
interpreted in terms of the standard error of 
a percentage of 50 (Lindquist, 1938) which 
would obtain if the true difference between 
the memory span scores of good and poor 
readers were zero, namely, o p. 50 = 3.3 per- 
cent. A percentage of 63 is removed 3.940 
irom the hypothetical mean percentage of 50; 
hence, it can be stated with reasonable cer- 
lainty that the observed percentage cannot 
be explained by chance alone, since the terms 
of the assumption are invalidated by the 
obtained facts. 


When the comparison between good and 
poor readers is made in terms of group mean 


scores on the memory span tests, the scores 
of the former for all measures exceeded the 


VISUAL AND AUDITORY MEMORY SPAN 


235 


scores obtained for the latter.*® The relevant 
figures are shown in Table XV. It will be 
seen that the differences between the groups 
on each of the nine measures, though small, 
consistently favor the good readers as a group. 
Since, as has been indicated in a previous 
section, the intercorrelations between the tests 
were somewhat low, and the intercorrelations 
between methods of scoring each test were not 
on the whole especially high, it would seem 
that some significance can be attached to the 
consistency with which the differences in each 
of the nine comparisons presented in this table 
take the same direction, even though no 
single one of the obtained differences is sta- 
tistically reliable. 

It might be of interest to investigate 
whether the relationship between reading 
ability and size of memory span, as here 
measured, varies with differences in chrono- 
logical age of the subjects. One might expect 
larger differences to obtain in the younger 
rather than in the older subjects, because it 
would seem that it is at the younger levels 
that such factors as are measured by memory 
span tests are likely to be more closely identi- 
fied with the reading process, consisting, as it 
does, at these levels of the ability to recog- 
nize and to retain in their due order letters 
both seen and heard, but especially seen. 
Since in older subjects the factor of meaning 
becomes predominant, whatever influences 
these rather elementary processes may have 
in differentiating between good and poor 
readers might be masked by greater differ- 
ences in the comprehending aspect of reading 
so that memory span scores alone might not 
differentiate as significantly between good and 
poor readers at the older as at the younger 
age levels. 

In order to test the above hypothesis, in- 
sofar as the data of this matching experiment 
permitted, the differences in memory span 
scores between the good and poor readers of 
the ten youngest pairs of subjects were com- 
pared with the differences existing between 
the good and poor readers of the ten oldest 
pairs of subjects. In 72 percent of the pos- 
sible comparisons at the younger level, the 
good reader of each pair excelled the poor 


46 On the basis of the school health records, the good readers 
and the poor readers present no differences in visual and 
auditory acuity. For the good readers: 4 subjects had 20/30 
vision; one subject, 20/40; and all others, 20/20, while in 
auditory acuity, 17 subjects were normal; 5 subjects, very 
good or keen. For poor readers: 3 subjects 20/30 
vision; all others, 20/20, while in auditory acuity, 15 — 
were normal; 7 sub , very good or keen. Data on auditory 
acuity of third subjects were not available. 


236 JOURNAL OF EXPERIMENTAL EDUCATION [Vol. 8, No; 


TABLE XV 
COMPARISON OF MEMORY SPAN SCORES* OF GOOD AND Poor READERS 


I.Tachistescopic Visual Test II. Auditory Test III. Temporal Views) Terr 


Scores Scores Sceres Sceres Sceres Sceres Scores Scores Scores 
(rood 
renders) 57.88 33.08 37.38 79.65 48.19 82.88 94.58 65.46 93,38 
M2 
(poor 
resders) 49.65 29.42 32.96 71.38 42.00 64.42 80.46 54.92 78,5, 
M1 8.03 3.66 4.92 8.27 6.19 16.46 14.12 10.54 14,28 
SD, 13.62 8.87 12.62 21.05 14.78 32.23 25.65 25.68 32,00 
SDo 12.81 8.48 10.82 18.31 14.34 30.17 17.92 23.22 36,43 
cn 2.72 1.77 2.52 4.21 2.96 6.45 5.13 5.14 = 6,42 
g ¥, 2.56 1.70 2.16 3.66 2.87 6.03 3.58 4.64 7.20 
ran - ¥, 3.74 2.45 3.32 5.58 4.12 8.83 6.26 6.92 9.7) 
2.15 1.49 1.48 1.48 1.50 2.09 2.26 1.52 1.53 


Probability of Chance 
Variation from Zero 


in 100) 
2 7 7 7 2 1 7 


* Computed on the basis of 20 test trials, not average number of letters per trial. 


TABLE XVI 


Memory SPAN DIFFERENCES BETWEEN GOOD READERS AND PooR READERS AT TWO CHRONOLOGICAL 
AGE Levets (C.A. RANGE: 8-1 To 10-11, MEDIAN C.A. 9-6) 


tc Visual Test II. Auditory Test III. Temperal Visual Test 
as 
88.5 68.1 953.° 


1. Good Readers 


2. Poor Readers 43.8 27.1 28.6 56.9 33.6 48.2 66.2 44.2 64,7 

3. Mean diff. be- 9.9 6.5 9.7 3.7 1.7 3.9 22.5 23.9 28.6 
tween 1 and 2 

4. SD ef diff. 7.29 6.94 5.10 13.27 10.91 18.21 30,21 41.09 57,42 

5. OW airr. 2.45 2.31 1.70 4.42 3.64 6.07 10,07 13.70 19.14 


6. Prob. mean diff. 
ia chance variation 


from zero 1 20 31 8626 1 4 7 
(chances in 100) 
(C. A. Range: 11-6 te 16-5, Median C.A. 11-10) 

1. Good Readers 60.5 32.8 36.3 97.6 57.9 107.7 108.35 735.5 103.2 
2. Poor Readers 55.1 351.4 37.7 81.1 50.4 79.8 92.9 62.0 85.4 
3. Mean aiff. be- 5.4 1.4 - 1.4 16.5 7.5 27.9 15.4 11.5 17.8 

tweer 1 and 2 
4. SD of aiff. 16.74 9.88 12,42 12.85 20.30 45.35 14.55 22.25 38.50 
5. OM aiff.. 5.58 3.29 4.14 4.28 6.77 14.44 4.85 7.42 12.86 


6. Prob. mean diff. 
is chance variation 


( 

from zero 17 33 83 1s 3 6 8 1 

(chances in 100) , 
7. Prob. diff. between 

and M aiff. 

due to chance 23 16 1 20 230 7° 27 21 32 ( 


* Difference in favor of older subjects. 


I.Tachistesco 
53.7 32.6 
: 


December, 1939] 


-eader in memory span score; whereas in 
only 54 percent of the cases at the older level 
did this hold true. The observed 72 percent 
(younger subjects) is removed from a hypo- 
thetical mean of 50 percent, which would 
obtain under the conditions assumed above, 
by 4.15, while the 54 percent observation is 
but .75@ from the hypothetical mean. It fol- 
lows then, that whereas chance alone could 
not account for the observed percentage of 
superiority in memory span scores of good 
readers at the younger age level, there are 
23 chances in 100, or approximately one in 
four, that chance alone could account for the 
results obtained for the older subjects. 


When the mean differences between good 
and poor readers were analyzed for the two 
chronological age levels, it was found (Table 
XVI) that the differences in favor of the 
younger good readers were proportionately 
much larger than the corresponding differ- 
ences for the older good readers on the two 
visual tests. This relationship was reversed 
for the Auditory Test, the mean differences 
for the three methods of scoring this test 
being proportionately larger in the case of the 
older than in younger subjects. 

Expressed in statistical terms (Cf. Table 
XVI) the probability that chance could ex- 
plain the differences between mean scores of 
good and poor readers on each of the three 
measures of both the Tachistoscopic and 
Temporal Visual Tests is greater at the older 
than at the younger levels. The reverse of 
this statement holds true for the three meas- 
ures of auditory span. It should be pointed 
out, however, (Cf. Table XVI, item 7) that 
the differences at the one age level are not 
sufficiently larger than the differences at the 
other age level to be statistically reliable. 

The following statements summarize the 
results presented in this unit: 


When the memory span scores of good and 
poor readers were compared, at different 
chronological age levels, the results show that 
the good readers excelled the poor readers 
(with whom they were paired) in memory 
span scores, judged on the basis of the per- 
centage of the scores earned by good readers, 
which exceeded the corresponding scores 
earned by the poor readers. Further, at the 
younger level ability to read well was more 
closely associated with relatively higher scores 
of memory span than was the case at the 
older level. The observed mean differences 


VISUAL AND AUDITORY MEMORY SPAN 


237 


between the memory span scores of good and 
poor readers, although small, consistently 
favored the good readers and would seem to 
indicate that factors other than chance alone 
operated to produce the results obtained. 


On the basis of mean scores a closer rela- 
tionship seems to exist between visual mem- 
ory span scores and reading ability in the 
case of younger rather than older subjects, 
whereas the three measures of auditory span 
apparently differentiate better between groups 
of older rather than of younger good and poor 
readers. This statement must, however, be 
qualified, because its proof is not entirely un- 
equivocal. The statement is made on the 
basis of using consistency of results as the 
measure of significance. 


3. Differences in Reading Ability Between 
Matched Groups Segregated on the Basis of 
Memory Span Scores. In the preceding ex- 
periment, it was found that when the compar- - 
ison was made on the basis of mean scores, 
memory span did not, in all cases, distinguish 
between good and poor readers in a highly 
significant manner. This may be explained by 
one of two reasons. Perhaps there is no closer 
relationship between reading ability and 
memory span than that indicated by the fig- 
ures, or factors other than memory span oper- 
ated to produce the differences in reading 
ability of these particular extreme groups. 
An analogy may be used to make this point 
clear: It should be obvious that persons with 
extremely serious eye defects will be handi- 
capped in their reading; yet an examination 
of a large group of poor readers may not 
necessarily reveal that any of them are visu- 
ally handicapped, because other factors may 
have contributed to the failure in reading. On 
the other hand, if a group of subjects, to 
begin with, is chosen on the basis of serious 
eye defects, it can be predicted that many of 
them will be poor readers. 

In the first case, we were dealing with the 
frequency with which eye defects are asso- 
ciated with poor reading; in the second, with 
the problem of how eye defects, when they do 
occur, affect ability to read. 

To return to the case in point, in the 
matching experiment described above, we are 
dealing only with the problem of the fre- 
quency with which a limited memory span 
may be associated with poor reading, so that 
our rather inconsistent results might be ex- 
plained by the fact that in these particular 


238 


cases, factors other than memory span oper- 
ated to differentiate the groups used. In the 
present instance, the procedure of the above 
experiment has been reserved, that is, 
matched groups of subjects were separated as 
widely as possible on the basis of memory 
span scores at the outset, and then compared 
for their respective reading achievement 
scores. It is hoped that this experiment will 
be successful in isolating the effect of mem- 
ory span alone, as it is related to reading 
ability, so that it may be possible to answer 
the question as to whether a limited memory 
span, when it does occur, is accompanied by 
peor reading ability. 

Twenty-four pairs of subjects from grades 
three, four, five, six, and eight were matched 
on the basis of grade, chronological age, and 
Dearborn C 1.Q., the partners of each pair 
beine senarated as widely as possible on the 
basis of memory span scores.’ The mean 
chronological age for the subjects (Group 1) 
of high memory span was 124.5 months, mean 
Dearborn C 1.Q. 108.6; for the subjects 
(Group 2) of low memory span, the mean 
chronological age was 126 months, mean 
Dearborn C 1.Q. 110.5. The mean differences 
in total scores for 20 trials for each test of 
memory span between Group 1 and Group 2 
ranged from 8.9 for accuracy of ‘placement 
scores on the Tachistoscopic Visual Test, to 
41.7 for the range scores of the Temporal 
Visual Test. In every case the advantage was 
with Group I (See Table XVII). 

In 75 percent of the 24 pairs, the subjects 
with low memory span scores were inferior to 
the subjects with high memory span scores in 


There were no differences in the visual and auditory 
acuity of the two groups of subjects. 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No, 2 


reading achievement. In the remaining 2; 
percent of the pairs, the subjects from Group 
1 (high memory span) earned the lower read. 
ing scores. A test of significance for the 
obtained 75 percent’*® can be made by finding 
the likelihood that chance alone could pro. 
duce the observed result in a hypothetica) 
situation where the true difference between 
the reading achievement scores of subjects 
possessing low memory spans would be zero. 
Then it could be reasoned thus: In a popu. 
lation of 24 pairs, the standard error of a per- 
centage of 50 is ten percent. The observed 7; 
percent value would be removed from the 
hypothetical mean of 50 percent by 2.50 
sigma. Chance alone could account for the 
obtained result only six times (approx- 
imately) in a thousand. 

In summary, therefore, it can be stated that 
low memory span, when it does occur, may 
impose certain obstacles in learning to read. 
An introduction to the nature and manifesta- 
tions of these impediments is presented in the 
individual studies which follow. 

4. Case Studies. I—J. was referred to the 
writer for remedial work in reading in the 
early part of July, 1937. He had been ina 
private school for three years, and had re- 
ceived a great amount of individual attention. 
A report from J.’s reading teacher stated that 

“In nine of the cases, or 37% percent, the difference in 
reading achievement scores between the members of each pair 
was large enough to be statistically reliable, in favor of the 
individual with relatively high memory span; in other 
low memory span was accompanied by inferior reading achieve- 
ment. The le errors of the reading grade scores were 
determined for each level represented by the pairs em- 
ployed in this matching experiment. The formula used for 
each grade was: PE score = .6745 6 V1 —ril, in which ¢ s 
the standard deviation of the reading grade scores, and rll i 
the reliability coefficient. Differences were taken to be reliable 
when they were at least as large as four times the probable 


error of a test score at the grade level represented. Cf. Kelley 
et al. (1929), p. 9. 


TABLE XVII 
COMPARISON OF MEAN Scores OF Two MATCHED GRoUPS* ON THREE TESTS OF MEMORY SPA‘ 


ral Visual Test 


Visual Test Auditery Test III, 
Scores Scores Scores 


Scores Scores Scores Scores Scores Scores 
1. Group 1 Mean 60.2 35.1 41.5 82.1 50.8 86.4 97.5 71.3 104.9 
2. Group 2 Mean 46.4 26.2 28.2 60.7 33.8 62.6 72.4 44.5 63.2 
8.9 13.3 21.4 17.0 35.8 25.1 27.0 41.7 


3. Difference be- 135.8 
tween 1 & 2 


* Group 1 included subjects of high memory s 


scores; Group 2, subjects of low mem. 


n 
ory span scores. The Groups were matched on the ooh of chronological age and Dearborn . 
10. he mean scores were computed on the basis of 20 test trials, not average number ° 


letters per trial. 


December, 1 939) 


during his three years of school attendance, 
progress in reading was impeded by certain 
persistent visual and auditory confusions, 
although visual and auditory acuity had been 
found normal. Insofar as the writer was able 
to determine at the onset of the remedial pro- 
gram, J. could recognize in written form, in 
addition to his name, only fifteen or sixteen 
words from the Gates Primary Reading Tests, 
and not all the letters of the alphabet. 

The 1916 form of the Stanford—Binet Tests 
was administered to the subject on July 6, 
1937. He earned a mental age score of 10 
years, 5 months; his chronological age at the 
date of examination was 8 years, 5 months, 
and his 1.Q. was 124. J. had failed all the 
sub-tests of the Binet which required reading 
in any form. The greatest number of digits 
which he was able to reproduce in the forward 
manner was four. On tasks involving reason- 
ing and language comprehension he did very 
well, and he appeared to possess an unusual 
amount of knowledge, especially in the field 
of general science. It was not possible to 
administer any reading tests. 


Two approaches were tried with only a 
small amount of temporary success during the 
first month of the tutoring period. The first, 
a word-picture presentation, was discarded 
because J. could not remember the letters 
making up the words of each word-picture 
unit. Next, an attempt was made through a 
more auditory approach. Word families, such 
as “old”, “hold”, “gold”, and “fold”, and 
“at”, “cat”, “bat”, and “sat” were drilled 
orally, but J. could not retain in the correct 
= the component letters of these familiar 
words. 


During the early part of August, J. was 
examined by Dr. Walter F. Dearborn, 
Director of the Harvard Psycho-Educational 
Clinic. It was found that he had great diffi- 
culty in reporting accurately certain visually- 
presented objective stimuli. Reversals were 
common. It was found, however, that J. had 
no difficulty with the kinesthetic patterns. 
Further, it was discovered with reasonable 
certainty that J. was both right-handed and 
right-eyed. 

In the light of these facts, it was necessary 
‘0 provide material for learning in which the 
appeal must be made through channels other 
than the visual or auditory sense fields. At 
irst, the tutor directed J.’s arm through the 
motions necessary to produce “d-o-g”. This 


VISUAL AND AUDITORY MEMORY SPAN 


239 


was repeated for other easy words, and it was 
found that later in the same day the subject 
was able to recognize the practiced words 
with relative ease. Modifications of this kines- 
thetic approach were used for several weeks, 
following the date of the laboratory examina- 
tion, with gratifying results. He was becom- 
ing more and more able to recognize words 
and phrases which had been traced only once. 


Care was taken to motivate and to encour- 
age J. as strongly as possible toward reading, 
and this, as well as the kinesthetic teaching 
approach, was undoubtedly responsible for 
his improvement. The comparative part 
played by each of these two factors could not 
be estimated. 


Before J. returned to school in the fall, he 
was given again Gray’s Oral Reading Para- 
graphs and the Gates Primary Reading Test, 
and the Gates Graded Word Pronunciation . 
Test. The reading grade equivalents of the 
raw scores earned on the three tests were 2.3, 
2.5, and 2.6, respectively. 

During the school year of 1937-1938, J.’s 
remedial work, still under the writer’s obser- 
vation, was continued with the same general 
approach employed during the preceding sum- 
mer, although he had changed to another 
school and was being tutored by another in- 
dividual. He continued to make progress in 
his new school environment. The change of 
schools altered noticeably J.’s attitude of 
resignation to failure which had been so 
apparent when the writer first met him. 

In the months of July and August, 1938, 
the writer resumed remedial work with J. 
When Form L of the Revised Stanford—Binet 
Scale was administered, J. obtained an I.Q. 
of 142 as compared with the previous finding 
of 124, part of this increase being explainable 
by the fact that the 1937 revision is some- 
what easier than the 1916 revision. Again J. 
had difficulty with the memory tests, failing 
the digits test at year X, and the sentence 
memory test at year XI. He passed, however, 
the bead chain test at year XIII. On reading 
tests, J. was found to be reading at a grade 
score level of approximately 3.5, or an im- 
provement of one grade from the previous 
September. 

In the second summer of work with the 
writer, J. continued to progress in reading and 
spelling, and was working in the Elson—Gray 
Reader Book 4 at the close of the summer. 


240 


J.s outstanding weakness was not cor- 
rected. He has continued to be extremely slow 
in learning through visual or auditory chan- 
nels. Much of his difficulty appears to be an 
inability to keep in proper sequence certain 
visual symbols and a failure to synthesize dis- 
tinct familiar sounds to form words. Both of 
these inadequacies are believed to be trace- 
able, at least in part, to retardation in the 
aspects of mental development included in the 
term ‘“‘memory span’. Treatment was not 
aimed to correct these defects but to teach 
reading without being blocked by them. The 
teaching plan was to use the subject’s excel- 
lent abilities in other directions, such as 
kinesthetic learning and general reasoning, 
to build up memory and reading from the 
available mental sources. 

II—W. was referred to the writer for 
remedial work in reading in September, 1938. 
He was ten years, 10 months old. He tended 
to be physically sluggish and awkward. For 
five years he had attended the public schools. 
The parents did not become aware of his 
reading handicap until his third year in 
school. At that time, he was put back into 
the second grade with the hope that he could 
catch up in his reading. The results were 
negative. 

In February, 1937, W. was examined by an 
oculist who reported that he had less than 
40 percent vision in the right eye. Wearing 
the glasses which were prescribed brought 
about no noticeable improvement in his school 
work. In August, 1938, W. was examined at 
the Dartmouth Eye Clinic. A size difference 
of ocular images was discovered; special lenses 
were prescribed and have been worn since. In 
December, 1938, W. became acutely ill with 
infected sinuses which were found to be ad- 
versely affecting his vision but which have 
since been successfully treated. Defective 
vision and particularly aniseikonia, then, 
would be expected to be important factors in 
W.’s reading difficulty. 

In September, 1938, Form L of the Revised 
Stanford—Binet Scale was administered, and 
W. obtained a mental age score of 12-6, his 
chronological age being 1o—10, his I.Q. 115. 
A great source of trouble on the test, as a 
whole, was W.’s poor memory spans. He 
failed the bead chain test at year XIII, the 
digits test at years IX, X, and XIII. The 
memory for designs test at year XI was also 
a failure. He did, however, pass the sentence 
test at years XI and Average Adult, and the 


JOURNAL OF EXPERIMENTAL EDUCATION 


[Vol. 8, No 


word memory test at year XIII. Tests 9; 
visual and auditory memory span not involy. 
ing verbal, meaningful material were respon. 
sible for all failures at the years IX, X, ang 
XI. On tests involving language comprehen. 
sion and interpretation, W.’s responses were 
good, his vocabulary score being at approx. 
imately year XIV, the abstract words tes 
score at year XIV, and the opposites tes 
scores at the Average Adult level. 

On Gray’s Oral Reading Paragraphs, W. 
scored at a grade level of 1.6, on the Gate; 
Graded Word Pronunciation Test, at 26. 
Typical errors in word recognition included 
“was” for “saw”, “three” for “their” 
“on” for “no”, “in” for “it”. He was able to 
give only nineteen letters of the alphabet 
from memory, although he wrote al! of them 
when dictated. On the Durrell—Sullivan Read. 
ing Capacity Test, W. scored at a grade level 
of 7.7, consonant with his score in language 
comprehension and oral vocabulary on the 
Binet Test. Since the Binet examination, W. 
has been receiving for the most part indi- 
vidualized instruction. Remedial work in 
reading has used primarily a_ kinesthetic 
approach, but training in phonetics was in- 
cluded. Difficult words were traced, written 
on a slate, and typed. Flash card drill was 
often used to measure the mastery of previ- 
ously confusing words and phrases. Sentence 
completion exercises, based on stories from 
W.’s readers, were used daily. 

After three months of remedial help W. was 
retested in reading. He scored at a grade level 
of 3.6 in the Gates Paragraph Reading Tesi, 
at 2.7 in the Gates Sentence Reading Test, 
and at 3.2 in the Gates Word Recognition 
Test. When remedial help had been contin- 
ued five months, W. scored at 3.0 in the 
Gates Word Pronunciation Test. 

In general, W. was found to be handicapped 
by poor visual and auditory rote memory 
Since the Binet Examination was given after 
the aniseikonia was corrected and showed 
outstanding difficulty in visual memory, " 
seems probable that W.’s eye defects, possibly 
combined with defective development in rote 
memory, made ordinary progress in reading 
impossible during his first five years in school 

Remedial instruction was planned to us 
W.’s excellent ability in language comprehen- 
sion and interpretation through his associative 
ability. The kinesthetic and, to a lesser de 
gree, the phonetic approach were used ™ 
teaching where visual and rote memory ha 


December, 1939} 


impeded learning. Desire to read was moti- 
vated as far as possible in W. through encour- 
agement and through making both teaching 
method and the material used for teaching 
that which would most strongly appeal to the 
subject. The extent to which this motivation 
caused W.’s improvement, and the respective 
parts played by the specific way of teaching 
and by the visual improvement could not be 
measured. No doubt, all were contributing 
factors. 

IV. Summary and Conclusions. The gen- 
eral and specific problems of the present in- 
vestigation will be reviewed before presenting 
together the results of the analyses made 
earlier. Notwithstanding the relatively great 
number of studies which have been conducted 
on the subject of memory span, no single 
study has included a wide enough range of 
considerations so that the problems in the 
study of memory span could be viewed in an 
integrated fashion. 

The general problem of this investigation 
was to study the visual and auditory memory 
spans of a representative sampling of public 
school children. For this purpose, three tests 
of memory span were devised, the three tests 
being indicated as Tachistoscopic Visual Test, 
Auditory Test, and Temporal Visual Test. 
These three tests of memory span were ad- 
ministered to 310 subjects, representing eight 
grade levels. 

The first specific problem was to determine 
the reliability of each test of memory span at 
each grade level studied, and to discover any 
differences which might exist between the re- 
liabilities of the three separate tests. A re- 
lated problem was to discover the reliability 
of each of three methods of scoring which 
were applied to the same test responses. On 
the whole, these tests of memory span were 
found to be as reliable, if not more reliable, 
than similar tests employed in earlier studies 
of memory span. It is believed that the ob- 
tained reliabilities are to be explained, in part, 
by the more careful standardization made 
possible by the use of a motion picture pro- 
jector and an electrical phonograph for the 
purposes of presenting the tests. The varia- 
tions observed in the sizes of the reliability 
coefficients varied with grade level of the sub- 
jects, with method of scoring the test records, 
and with the particular test being considered. 
The tests have been found to be more reliable, 
in general, at the lower grade levels. They 
were most reliable in grade two and least re- 


VISUAL AND AUDITORY MEMORY SPAN 241 


liable in grade six on the same bases. The 
most reliable scoring method was the one in 
which items were scored as correct, regardless 
of whether their original order was preserved 
in the subject’s responses. Generally speak- 
ing, the two visual tests of memory span were 
found to be more reliable than the test of 
auditory memory span. 

The next problem was to discover the rela- 
tionships between the three methods of scor- 
ing the same test. The intercorrelations be- 
tween the three scoring methods for the three 
tests were computed at each grade level. The 
correlations between the methods of scoring 
are, on the whole, significantly high, but not 
high enough to justify using the scores inter- 
changeably. On the basis of the evidence 
presented, it would seem that there were no 
large consistent differences in correlations be- 
tween the three methods of scoring from test 
to test. The two methods of scoring which 
were consistently most highly correlated were- 
the accuracy of placement and range scores. 

When the problem of the relationship be- 
tween visual and auditory spans was studied 
on the basis of the intercorrelations between 
comparable scores for the three tests, it was 
found that while these coefficients were not, 
on the whole, significantly high, they were 
somewhat higher when the measurements of 
these spans were made in terms of gross span 
scores. Further, the correlations between 
scores on the two tests, employing the method 
of successive stimulation, tended to be higher 
than the correlations between either of these 
tests and the Tachistoscopic Visual Test, for 
which test the method of presentation must 
obviously be some form of simultaneous 
stimulation. 

Three classifications of our subjects have 
been made: (1) by chronological age, (2) by 
mental age, and (3) by grade placement. 
The averages for the three methods of scoring 
each test have been presented graphically. 
The most regular growth in memory span was 
shown in the classification which grouped sub- 
jects according to mental age. This manner 
of classifying the subjects resulted in fewer 
interruptions in memory span scores (aver- 
age) for successive age levels. On the basis 
of this classification of subjects, growth is 
observed throughout the mental age level of 
21 years, whereas according to the chronolog- 
ical age classification of our subjects, it would 
seem that the maximum in memory span 
development was reached at the eighteenth 


242 


year. The relatively small number of subjects 
at the upper levels of both chronological and 
mental age levels permits only a tentative 
statement regarding the upper limits of mem- 
ory span growth at the age levels investigated 
by the present writer. 

The age norms resulting from the present 
study are lower than comparable norms se- 
cured by previous writers. This finding is 
explained chiefly by the fact that the use of 
stimulus units of varying lengths, adapted by 
the threshold method to the age levels investi- 
gated in the case of earlier studies, permitted 
the subjects of those previous investigations 
to report a relatively high percentage of each 
stimulus series presented. Another factor 
which contributed to this difference was the 
method of determining span scores in the 
earlier studies, in which instances “best re- 
sponses” were credited. In the present study, 
stimulus series of uniform length (9 letters 
of the alphabet arranged in nonsense order) 
were employed in the three tests, and span 
scores are arithmetical averages based on 20 
test trials. However, the present tests of 
memory span distinguish between successive 
age levels equally as well as the tests em- 
ployed in earlier studies. When the subjects 
were classified according to grade placement, 
growth was evidenced, in general, from grade 
to grade on all tests for the three methods of 
scoring. 

With only a few exceptions, the highest 
scores of memory span in the present study 
occurred with the Temporal Visual Test, while 
the scores on the Tachistoscopic Visual Test 
were consistently the lowest memory span 
scores obtained. The Temporal Visual and 
the Auditory Tests discriminate equally as 
well between successive mental age levels. 

A further analysis was made of the memory 
span data by grade placement. An attempt 
was made to discover whether there was a 
tendency for the subjects to report letters 
with greater frequency at certain positions 
within the letter series than at others, and 
especially to find any characteristic changes in 
this frequency as memory span increased 
from grade to grade. The results of the three 
tests were considered separately in these anal- 
yses, which were presented graphically in the 
form of letter position curves, plotted through 
points which corresponded to the percentage 
scores obtained at each letter position. The 
letter position curves for the Tachistoscopic 
Visual Test showed that the frequency with 


JOURNAL OF EXPERIMENTAL EDUCATION 


[ Vol. 8, No. 2 


which letters were reported at the nine posi. 
tions decreased from left to right. Moreover 
increases in span by grades were described jp 
terms of an extension of these spans from leit 
to right. Striking similarities were observe; 
in the letter position curves for the Auditory 
and Temporal Visual Tests, especially after 
grade three. The differences observed between 
the two serially presented tests, on the one 
hand, and the Tachistoscopic Visual Test on 
the other, consisted in the greater memory 
value for the letters at the final positions jn 
the case of the two serial tests. In these two 
tests, letters at both extremes tended to be 
favored in the report, whereas in the Tachis. 
toscopic Visual Test reports of letters were 
confined largely to the positions at the left. 
Even in the case of the two serial tests, how- 
ever, the initial positions tended to be favored 
over the final positions, the superiority of the 
initial position being more pronounced at the 
lower grade levels than at the upper levels, 
where the scores for the two extreme posi- 
tions tended to approximate each other. It js 
possible to infer, then, that increases in mem- 
ory span scores, by grade, for the two serial 
tests and the Tachistoscopic Test are to be 
described differently. In the case of the 
former, the increases proceeded from both 
extreme positions inward; while in the latter. 
as remarked above, the increases occurred 
from ieft to right. 

The final problem was to discover what 
relationships exist between memory span 
scores and scores on standardized reading 
tests. Correlations were computed between 
each of two measures of silent reading ability 
(comprehension and speed) and each of the 
nine measures of memory span studies in this 
experiment. The obtained figures, although 
indicating a statistically significant relation- 
ship, as judged by the consistency with which 
the correlations of positive sign appeared 
were in no single case sufficiently large to per- 
mit a prediction of reading achievement on 
the basis of memory span scores. Further, 
this problem of the relationship between read- 
ing and memory span was studied by meats 
of two matched-group experiments. In the 
first of these, groups of subjects were sept 
rated on the basis of their reading achieve: 
ment scores and matched on the basis of 
grade, chronological age, and Dearborn ( 
1.Q. When the mean memory span scores of 
good readers and poor readers were compared 
it was found that although not one of the 


December, 1939] 


obtained differences was in itself sufficiently 
large to be highly significant, the differences 
taken as a whole were larger than mere 
chance deviations, and in every case favored 
the good readers. The data of the first match- 
ing experiment indicated the existence of a 
closer relationship between memory span 
scores and reading ability in the case of 
younger as against older subjects. The results 
of the second matching experiment, in which 
the subjects were separated on widest possible 
differences in memory span scores, to begin 
with, and matched as before on the basis of 
grade, chronological age, and Dearborn C 
L.Q., tended to show that in some cases of 
serious reading retardation, especially at the 
lower grade levels, extremely low memory 
span scores were important contributing 
factors. The writer has presented studies of 
two individual cases, in which there was evi- 
dence that poor visual and auditory memory 
span contributed to the reading disabilities 
manifested by the subjects. 

The following conclusions are drawn from 
the results which have been secured in this 
study: 

1. The group method of measuring memory 
span yields reliability coefficients of adequate 
size for group diagnosis at all grade levels, 
and reasonably high coefficients for individual 
diagnosis at the second grade level. 

2. The correlations between the methods 
of scoring used in this experiment are not 
high enough to justify using the methods 
interchangeably. 

3. Wherever a relation may exist between 
auditory and visual memory span, it is more 
likely to be explained on the basis of similar- 
ity in method of presentation of the tests in- 
volved in the relationship than on the basis 
: existence of a generalized memory span 

Ity. 

4. The present tests measure growth in 
memory span, as defined, at the age levels 
where they have been applied. 

5. Growth in tachistoscopic span occurs 
from left to right, where growth in each of 
the two serially presented tests occurs from 
both extremes inward. 

6. The present tests of memory span can- 

not be used to predict reading test scores. 
_ 7. In extreme cases of serious retardation 
in reading achievement, limited memory span 
ability might be an important contributing 
factor, especially with younger subjects. 


VISUAL AND AUDITORY MEMORY SPAN 


10. 


II. 


13. 


14. 


15. 


243 


BIBLIOGRAPHY 


. Abelson, A. P. “The Measurement of 


Mental Ability of Backward Children”, 
British Journal of Psychology, IV 
(1911), 268-314. 


. Bennett, F. “Correlation Between Differ- 


ent Memories”, Journal of Experimental 
Psychology, VI (1916), 404. 


. Blankenship, A. B. “Memory Span: A 


Review of the Literature”, Psychological 
Bulletin, XXXV _ (1938), 1-25. 


. Bolton, E. B. “The Relation of Memory 


to Intelligence”, Journal of Experimental 
Psychology, XIV (1931), 37-67. 


. Bond, N. J., and Dearborn, W. F. “The 


Auditory Memory and Tactual Sensibil- 
ity of the Blind”, Journal of Educational 
Psychology, VIII (1917), 21-26. 


. Brotemarkle, R. A. “Some Memory Span 


Test Problems”, Psychological Clinic, XV_ 
(1924), 229-258. 


. Burt, C. “Experimental Tests of General 


Intelligence”, British Journal of Psychol- 
ogy, III (1909), 94-177. 


. Clark, A. S. “Correlation of Auditory 


Digit Memory Span with General Intel- 
ligence”, Psychological Clinic, XV 
(1924), 259-260. 


. Crosland, H. R. “The Influence of Letter 


Position on Range of Apprehension—A 
Reply to Dr. Tinker’’, Psychological Bul- 
letin, XXVI (1929), 375-377. 

——. “Superior Elementary-School Read- 
ers Contrasted with Inferior Readers in 
Letter Position, ‘Range of Attention’, 
Scores”, Journal of Educational Research, 
XXXII (February, 1939), 110-138. 

and Johnson, G. “The Range of 
Apprehension as Affected by Inter-letter 
Hair Spacing and by the Characteristics 
of Individual Letters”, Journal of Applied 
Psychology, XII (1928), 82—124. 
Davis, E. A. “Knox Cube Test and Digit 
Span’’, Journal of Genetic Psychology, 
XL (1932), 234-237. 

English, H. B. Student’s Dictionary of 
Psychological Terms. New York: Harper 
and Brothers, 1934. 78 p. 

Fisher, R. A. Statistical Methods for 
Research Workers, 160-197. London: 
Oliver and Boyd, 1934. 

Galton, F. “Prehension of Idiots”, Mind, 
XII (1897), 79-82. 


2 
| 
12. 
= 
= 
= 


244 


10. 


19. 


20. 


te 


30. 


31. 


. Jacobs, J. 


JOURNAL OF EXPERIMENTAL EDUCATION 


Garrett, H. E. “The Relation of Tests 
of Memory and Learning to Each Other 
and to General Intelligence in a Highly 
Selected Adult Group”, Journal of Edu- 
cational Psychology, XTX (1928), 601— 

-. Great Experiments in Psychology, 
Chapter III. New York: Century Co., 
1930. 


. Gates, A. I. “The Mnemonic Span for 


Visual and Auditory Digits”, Journal of 
Experimental Psychology, 1 (1916), 393- 
403. 

Guilford, J. P., and Dallenbach, K. M. 
“The Determination of Memory”, Amer- 
ican Journal of Psychology, XXXVI 
(1925), 621-628. 

Hao, Y. T. “The Memory Span of 600 
Chinese School Children in San Fran- 
cisco’, School and Society, XX (1924), 
507-510. 


. Henmon, V. A. C. “The Relation Be- 


tween Mode of Presentation and Reten- 
tion”, Psychological Review, XTX (1912), 
79-096. 


. Hull, C. L. Aptitude Testing, 397-38. 


Yonkers-on-Hudson: World Book Co., 


1928. 


. Humpstone, H. J. ‘““Memory Span Tests”, 


Psychological Clinic, XII (1919), 196— 
200. 

“Experiments on ‘Prehen- 
sion’”’, Mind, XII (1887), 75-79. 


. Jones, A. M. “An Analytical Study of 


120 Superior Children”, Psychological 
Clinic, XVI (1925), 19-76. 


. Kelley, T. L. Statistical Method. New 


York: Macmillan Co., 1924. 390 p. 
Interpretation of Educational 
Measurement, 210-211. Yonkers-on- 
Hudson: World Book Co., 1927. 

et al. New Stanford Achievement 
Test, Guide For Interpreting (Second Re- 
vision). Yonkers-on-Hudson: World Book 
Co., 1929. 


. Koehler, W. The Place of Value in a 


World of Facts. New York: Liveright 
Publishing Co., 1938. 

Koffka, K. Principles of Gestalt Psychol- 
ogy, 481-493. New York: Harcourt, 
Brace and Co., 1935. 

Lindquist, E. F. A First Course in Sta- 
tistics. Boston: Houghton Mifflin Co., 
1938. 226 p. 


32. 


33: 


34. 


35: 


36. 


37- 


38. 


39- 


40. 


41. 


42. 


43. 


45. 


[Vol. 8, No.2 


McCaulley, S. “A Study of the Relative 
Value of the Audito-Vocal Forward Mem. 
ory Span and the Reverse Span as Diag. 
nostic Tests”, Psychological Clinic, Xy 
(1923), 278-201. 

Mitchell, D. “Variability in Memory 
Span”, Journal of Educational Psycho. 
ogy, X (1919), 445-457. 

Peatman, J. G., and Locke, N. y. 
“Studies in the Methodology of the Digit 
Span Test”, Archives of Psychology, No, 
167, 1934. 35 P- 

Robinson, E. S. “Some Factors Deter. 
mining the Degree of Retroactive Inbi- 
bition”, Psychological Monographs, 
XXVIII (1920), 128. 

—— and Brown, M. A. “Effect of Serial 
Position Upon Memorization’’, American 
Journal of Psychology, (October, 1926), 
538-552. 

Smedley, F. W. Report of the Depari- 
ment of Child Study and Pedagogic In- 
vestigation, Chicago Public Schools. Re- 
port of U. S. Commissioner of Education, 
Vol. I, 1902, 1095-1138. 

Starr, A. S. “The Diagnostic Value of 
the Audito-Vocal Digit Memory Span’, 
Psychological Clinic, XV (1924), 61-84. 
Terman, L. M. The Measurement of In- 
telligence. Boston: Houghton Mifflin Co., 
1916. 362 p. 

and Merrill, M. Measuring Intelli- 
gence. Boston: Houghton Mifflin Co., 
1937. 461 p. 

Tinker, M. A. “Visual Apprehension and 
Perception in Reading”, Psychological 
Bulletin, XXVI (1929), 223-240. 
——. “The Influence of Letter Position 
on Range of Visual Apprehension—A 
Reply to Dr. Crosland”, Psychological 
Bulletin, (1929), 611-613. 

. “The Effect of Color on Visual 
Apprehension and Perception”, Genetic 
Psychology Monographs, XI (1932), 
136. 


. Warren, H. C. Dictionary of Psychology, 


p. 163. Boston: Houghton Mifflin Co., 
1934. 

Woodworth, R. S. Experimental Psychol- 
ogy, 5-50. New York: Henry Holt and 
Co., 1938. 


SEX DIFFERENCES IN SPEED OF READING: A CORRECTION 


JosePH E. Moore 
George Peabody College for Teachers 


The writer is indebted to Dr. George K. 
Bennett and Dr. Sydney Roslow of the Psy- 
chological Corporation for checking the data 
and pointing out certain errors appearing in 
Table II, page 113, of the September (1939) 
number of the Journal of Experimental Edu- 
cation. Fortunately the errors did not change 
the basic findings appreciably, but they 
should be corrected in the interest of accu- 
racy. 

The original and corrected column in 
Table II (numbered the same as in the orig- 
inal table) along with the revised interpreta- 
tions are presented below. 


It appears from Table II that only two 
(not four as formerly reported) of the ob- 
tained differences are reliable statistically, 
and these favor the girls. The differences 
favoring the girls at the high school level in 
the tenth and eleventh grades are reliable. 
There are no statistically reliable differences 
appearing at the college level. The greatest 
difference in reading speed was found in the 
sophomore subjects and this difference 
iavored the girls. The difference at the sopho- 
more level is quite similar to that found by 


Berman and Bird’ in testing a group of 463 
sophomores. When all the male subjects are 
compared with all the female subjects in the 
present study, the difference in reading speed 
favors the girls and is statistically reliable. 

Summary of Results ——Girls appear to be 
consistently more rapid readers than boys at 
each grade level from the eighth grade in 
junior high school through the sophomore 
year in college. This superiority is apparent, 
even though the sampling of girls was more 
positively skewed than was that of the boys. 
The difference between the means in the 
number of paragraphs read by the boys and* 
by the girls was statistically reliable in only 
two (not four as previously stated) of the 
nine comparisons, but both instances favored 
the girls. When all the girls’ scores are com- 
bined into one distribution and compared with 
the combined scores of the boys, the mean 
score of the girls excels that of the boys and 
the difference is statistically reliable. 

Boys surpassed girls in the number of 
paragraphs read at the junior and senior 
years in college, but this difference was not 
great enough to be statistically reliable. 


_ ‘Isabelle Berman and Charles Bird, “Sex Differences in 
Journal of Applied Psychology, XVII 


TABLE II (Revised) 


Tue ORIGINAL AND CORRECTED COLUMNS SHOWING THE CLASS, THE STANDARD ERROR OF THE 
DIFFERENCE, THE CRITICAL RATIO, AND THE CHANCES IN ONE HUNDRED 
THAT DIFFERENCES ARE SIGNIFICANT 


Original 
Class S.D./Diff. C.R. 
1.11 2.78 
1.06 3.30 
.24 1.24 
Sophomores ____________ 25 8.41 
Seniors 1.23* 1.05 
Combined Total _______ .34 5.80 


* Differences favoring the boys. 


Corrected 
Chances Chances 
in 100 S.D./Diff. C.R. in 100 
100 1.11 2.78 100 
100 .94 1.76 96.0 
99 1.12 3.13 100 
100 1.06 3.30 100 
99 -76 .78 78.2 
89 65.2 
100 .80 2.65 99.5 
98 1.23* 2.17 98.6 
85 1.36* 95 83.0 
100 34 5.80 100 


245 


